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MOLECULES FOR DIAGNOSTICS AND THERAPEUTICS 



TECHNICAL HELD 

The present invaition rdates to human molecules for diagnostics and therapeutics and to the 
5 use of these sequences in the diagnosis^ study, prevaition, and treatment of diseases associated with 
human molecules. 



BACKGROUND OF THE INVENTION 
The human genome is comprised of thousands of genes, many encoding gene products that 

10 fimction in the maintenance and growth of the various cells and tissues in the body. Aberrant 

expression ot mutations in these goies and thdr products is the cause of, or is associated with, a variety 
of human diseases such as cancer and other cdl proliferative disorders, autoimmune/inflammat0ry 
disorders, infections, devdopmeotal disorders, endocrine disorders, metabolic disorders, neurological 
disorders, gastrointestinal disorders, transport disorders, and connective tissue disorders. The 

X 5 identification of these genes and their products is the basis of an ever-expanding effort to find markers 
for early detection of diseases, and targ^ for thdr prevoition and treatmrat Therefore, these genes 
and their products are useful as diagnostics and therapoitics. These genes may encode, fCH^ example, 
enzyme molecules, molecules assodated with growth and devdopmoit, biodi^cal pathway molecules, 
extracdlular information transmission molecules, xeceptix molecules, intracdlular signaling molecules, 

20 membrane transport molecules, protdn modification and maintenance molecules, nucldc acid synthesis 
and modification molecules, adhesion molecules, antigen recognition molecules, seaeted and 
^tracdlular matrix molecules, cytoskdetal molecules, ribosomal molecules, dectron transfer 
assodated molecules, transaiption factor molecules, chromatin molecules, cell membrane molecules, 
and organelle assodated molecules. 

25 For example, cancer rqiresents a type of cdl proliferative disOTder that affects nearly every 

tissue in the bocty. A wide variety of molecules, dther aberrantly expressed or mutated, can be the 
cause of, or involved with, various cancers because tissue growth involves complex and ordered 
patterns of cell proliferation, cdl differ^ation, and apoptosis. Cdl proliferation must be r^ated to 
maintain both the number of cdls and thdr spatial organization. This regulation 6espen6s upon the 

30 apprq)riate expression of protdns v^iiich control cdl cycle progression in response to extracdlular 
signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or 
nutrient starvation. Molecules whidi directly indirectly modulate cell cyde progression fall into 
several categories, including growth factors and their receptors, second messenger and signal 
transduction proteins, oncogene products, tumor-suppressor protdns, and mitosis-promoting factors. 

3 5 Aberrant expression or mutations in any of these gene products can result in cdl proliferative disorders 
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sudi as cancer. Oncogenes are genes generally derived from nonnal genes that, thnmgh abnormal 
expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis). 
Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways and include 
growth factors, growth factor receptors, intracdhilar signal transducers, nudear transaiption factors, 
5 and cell-cycle control proteins. In contrast, tumor-suppressor genes are involved in inhibiting cdl 
proliferatioa Mutations which cause reduced function or loss of function in tumor-suppressor genes 
result in aberrant cell proliferation and cancer. Although many different genes and thdr products have 
been found to be associated with cell proliferative disorders such as cancer, many more may exist that 
are yet to be discovered. 

1 0 DNA-baised arrays can provide a single way to explore the expression of a single polymorphic 

goie or a large number of genes. When the expression of a single gene is explored, DNA-based arrays 
are anployed to detect the expression of specific gene variants. For example, a p53 tuniGr suppressor 
gene array is used to determine whether individuals are carrying mutations that predispose thm to 
cancer. A cytochrome p450 gene array is useful to determine whether individuals have one of a numbCT 

15 of specific mutations that could result in inaeased drug metabolism, drug resistance or drug toxicity. 

DNA-based array technology is especially relevant for the rapid screening of expression of a 
large number of genes. Tha-e is a growing awareness that gaie expression is affected in a global 
fashion. A genetic predisposition, disease or therapeutic treatment may alTect, directly or indirectly, the 
expression of a large number of genes. In some cases the interactions may be expected, such as when 

20 the genes are part of the same signaling pathway. In otho* cases, such as when the genes participate in 
sqjarate signaling pathways, the interactions may be totaUy unexpected. Therefore, DNA-based arrays 
can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the 
expression of a large number of gmes. 

2 5 Enzyme Molecules 

SEQ ID N0:1, SEQ ID N0:2, SEQ ID N0:3, SEQ ID N0:4. and SEQ ID N0:5 encode, for 
example, human enzyme molecules. 

The cellular processes of biogenesis and biodegradation involve a number of key enzyme 
classes Including oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. These 

30 ^yme classes are each comprised of numerous substrate-specific enzymes having precise and well 
regulated functions. These enzymes function by facilitating metabolic processes such as glycolysis, 
the tricarboxylic cycle, and fatty acid metabolism; synthesis or degradation of amino adds, steroids, 
phospholipids, alcohols, etc.; regulation of cell signalling, proliferation, infiamation, apoptosis, etc., 
and through catalyzing critical steps in DN A replication and repair, and the process of translation. 

35 Oxidoreductases 
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Many pathways of biogenesis and biodegradation require oxidoreductase (ddiydrogoiase or 
reductase) activity, coupled to the reduction or oxidation of a donor or acceptor cofactor. Potential 
cofactors include cytocbromes, oxygen, disulfide, iion-sulfur proteins, flavin adenine dinucleotide 
(FAD), and the nicotinamide adenine dinucleotides NAD and NAD? (Newsholme, E.A. and A.R. 
5 Leech (1983) Biochemistry for the Medical Sciences. John Wiley and Sons, Chichester, U.K. pp. 
779-793). Reductase activity catalyzes the transfer of electrons between substnite(s) and cof actor(s) 
with concurrent oxidation of the cofactor. Hie reverse ddiydrogenase reaction catalyzes the reduction 
of a cofactor and consequent oxidation of the substrate. Oxidoreductase enzymes are a broad 
supaf amily of proteins that catalyze numerous reactions in all cells of organisms ranging from 

10 bacteria to plants to humans. These reactions include metabolism of sugar, certain detoxification 
reactions in the liver, and the synthesis or degradation of fatty acids, amino acids, glucocorticoids, 
estrogens , andrGgens, and prcstaglandins. Different f aniily meinbers arc iidiueu accor uiiig io the 
direction in which their reactions are typically catalyzed; thus they may be refenred to as 
oxidoreductases. oxidases, reductases, or dehydrogenases. In addition, family members often have 

15 distinct cellular localizations, including the cytosol. the plasma membrane, mitochondrial inner or 
outer membrane, and peroxisomes. 

Short-chain alcohol dehydrogenases (SCADs) are a family of ddiydrogenases that only share 
15% to 30% sequence id^ty, with similarity predominantly in the coenzyme binding domain and 
the substrate binding domain. In addition to the well-known role in detoxification of ethanol, SCADs 

20 are also involved in synthesis and degradation of fatty adds, steroids, and some prostaglandins, and 
are therefore implicated in a variety of disorders such as lipid storage disease, myopathy, SCAD 
deficiency, and certain g»ietic disorders. For example, retinol dehydrogenase is a SCAD-family 
member (Simon, A. el al. (1995) J. Biol. Chem. 270:1 107-1 1 12) fliat converts retinol to retinal, the 
precursor of retinoic acid. Retinoic acid, a regulates of diiferentiation and 2qx)ptosis, has been shown 

25 to down-regulate genes involved in cell proliferation and inflammation (Chai, X. et al. (1995) J. Biol. 
Chem. 270:3900-3904). In addition, retinol dehydrogenase has been linked to hereditary eye diseases 
such as autosomal recessive childhood-onset severe retmal dystrophy (Simon, A. et al. (1996) 
Genomics 36:424-430). 

Propagation of nerve impulses, modulation of cell proliferation and differentiation, induction 

30 of the immune response, and tissue homeostasis involve neurotransmitter metabolism (Weiss, B. 
(1991) Neurotoxicology 12:379-386; Collins, S.M. et al. (1992) Ann. N.Y. Acad. Sci. 664:415-424; 
Brown, J.K. and H. Imam (1991) J. Inherit Metab. Dis. 14:436-458). Many patiiways of 
neurotransmitter metabolism require oxidoreductase activity, coupled to reduction or oxidation of a 
cofaaor, such as NAD*/NADH (Newshohne. E.A. and A.R. Leech (1983) Biochemistry for tiie 

3 5 Medical Sciences . John Wiley and Sons, Chichester, U.K. pp. 779-793), Degradation of 
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catecholamines (epinq)bnne or noiq)ineptmne) requires alcohol dehydrogenase (in the brain) or 
aldehyde ddiydrogenase (in peripheral tissue). NAD^ -dq)endent alddiyde dehydrogenase oxidizes 5- 
hydroxyindole-3-acetate (the product of S-hydroxytryptamine (serotonin) metabolism) in the brain, 
blood platelets, liver and pulmonary endothelium (Newsholme, supra , p. 786). Other 
5 neurotransmitter degradation pathways that utilize NAD^/NADH-dq}endent oxidoreductase activity 
indude those of L-DOPA (precursor of dopamine, a neuronal excitatory compound), glycine (an 
inhibitory neurotransmitter in the brain and spinal cord), histamine (liberated from mast cells during 
the inflammatory response), and taurine (an inhibitory neurotransmitter of the brain ston, spinal cord 
and retina) (Newsholme, supra, pp. 790, 792). Epigenetic or genetic defects in neurotransmitttf 

10 metabolic pathways can result in a spectrum of disease states in different tissues including Parkinson 
disease and inherited myoclonus (McCance, K.L. and S.E. Huether (1994) Pathophysiolopv . Mosby- 
Year Book, Inc.. St. Louis MO, pp. 402-404; Gundlach-, A.L. (1990) FASEB J. 4:2761-2766). 

Tetrahydrofolate is a darivatized glutamate molecule that acts as a carrier, providing activated 
one-carbon units to a wide variety of biosynthetic reactions, including synthesis of purines, 

15 pyrimidines, and the amino acid methionine. Tetrahydrofolate is genCTated by the activity of a 
holoenzyme complex called tetrahydrofolate synthase, which includes three enzyme activities: 
tetrahydrofolate dehydrogenase, tetrahydrofolate cyclohydrolase, and tetrahydrofolate synthetase. 
Thus, tetrahydrofolate dehydrogenase plays an important role in generating building blocks for 
nucleic and amino acids, crucial to proliferating cells. 

20 3-Hydroxyacyl-CoA dehydrogenase (3HACD) is involved in fatty acid metabolism. It 

catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, with concomitant oxidation of 
NAD to NADH, in the mitochondria and peroxisomes of eukaryotic cells. In p^oxisomes, 3HACD 
and enoyl-CoA hydratase form an enzyme complex called bifunctional enzyme, defects in which are 
associated with peroxisomal bifunctional enzyme deficiency. This intmuption in fatty add 

2 5 metabolism produces accumulation of very-long chain fatty acids, disrupting development of the 

brain, bone, and adrenal glands. Infants born with this deficiency typically die within 6 months 
(Watkins, P. et al. (1989) J. Clin. Invest. 83:771-777; Online M^delian Inheritance in Man (OMIM), 
#261515). Hie neurodegeneration that is characteristic of Alzheimer's disease involves development 
of extracellular plaques in certain brain regions. A major protein component of these plaques is the 
30 pqptide amyloid-p (Ap), which is one of several cleavage products of amyloid precursor protein 
(APP). 3HACD has been shown to bind the AP peptide, and is overexpressed in neurons affected in 
Alzheuner's disease. In addition, an antibody against 3HACD can block Uie toxic effects of AP in a 
cell culture model of Alzheimer's disease (Yan, S. et al. (1997) Nature 389:689-695; OMIM. 
#602057). 

3 5 Steroids, such as estrogen, testosterone, corticosterone, and others, are generated firom a 
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common precursor, diolesterol, and are interconverted into one another. A wide variety of enzymes 
act upon cholesterol including a number of dehydrogenases. Steroid dehydrogenases, such as the 
hydroxysteroid dehydrogenases, are involved in hypertension, fertility, and cancer (Duax, W.L. and 
D. Ghosh (1997) Steroids 62:95-100). One such dehydrogenase is 3-oxo-5-a-steroid dehydrogenase 
5 (OASD), a microsomal membrane protein highly expressed in prostate and other androgen-responsive 
tissues. OASD catalyzes the conversion of testosterone into dihydrotestosterone, which is the most 
potent androgen. Dihydrotestosterone is essential for the formation of the male phenotype during 
embryogenesis, as well as for propo- androgen-medialed growth of tissues such as the prostate and 
male genitalia. A defect in OASD that prevents the conversion of testosterone into 
10 dihydrotestosterone leads to a rare form of male pseudohermaphroditis, characterized by defective 
formation of the external genitalia (Andersson. S. et al. (1991) Nature 354:159-161; Labrie, F. et al. 
(1992) Endocrinology 131:1571-1573; OMIM #264600). Thus, OASD plays a central role in sexual 
differentiation and androgen physiology. 

np-hydroxy steroid dehydrogenase (17PHSD6) plays an important role in the regulation of 
15 the male reproductive hoimone, dihydrotestosterone (DHTT). 1 7PHSD6 acts to reduce levels of 
DHTT by oxidizing a precursw of DHTT, 3o-diol, to androsterone which is readily glucuronidated 
and removed from tissues. 17pHSD6 is active with both androgen and estrogen substrates when 
expressed in embryonic kidney 293 cdls. At least five other isozymes of 17PHSD have been 
identified that catalyze oxidation and/or reduction reactions in various tissues with preferences for 
20 different st^id substrates (Biswas, M.G. and D.W. Russell (1997) J. Biol Oiem. 272:15959- 
15966). For example, 17PHSD1 preferentially reduces estradiol and is abundant in tiie ovary and 
placenta. 17pHSD2 catalyzes oxidation of androgens and is present in tiie endometrium and placenta. 
17pHSD3 is exclusively a reductive enzyme in tiie testis (Geissler, W.M. et al. (1994) Nat. Genet. 
7:34-39). An excess of androgens such as DHTT can contribute to cotain disease states such as 

2 5 b^gn prostatic hyperplasia and prostate cancer. 

Oxidoreductases are components of tiie fatty acid metabolism pattiways in mitochondria and 
peroxisomes. The main beta-oxidation pathway degrades botii saturated and unsaturated fatty adds, 
while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty 
acids. Hie auxiliary beta-oxidation enzyme 2,4-dicnoyl-CoA reductase catalyzes tiie removal of 

3 0 even-numbered double bonds from unsaturated fatty acids prior to their enu-y into tiie main beta- 

oxidation pathway. The enzyme may also remove odd-numbered double bonds from unsaturated 
fatty acids (Koivuranta, K.T. et al. (1994) Biochem. J. 304:787-792; Smeland, T.E. et al. (1992) Proc. 
Nati. Acad. Sd. USA 89:6673-6677). 2,4-dienoyI-CoA reductase is located in boUi mitochondria and 
peroxisomes. Inherited deficiendes in mitochondrial and peroxisomal beta-oxidation enzymes are 
3 5 associated witii severe diseases, some of which manifest ttiemselves soon aft^ birth and lead to deatii 
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within a few years. Defects in beta-oxidation are associated with Reye's syndrome, 2bllweger 
syndrome, neonatal adrenoleukodystrophy, infantile Refsum's disease, acyl-CoA oxidase deficiency, 
and biAmctional protein deficiency (Suzuki, Y. et al. (1994) Am. J. Hum. Genet. 54:36-43; Hoeller, 
supra : Cotran, R.S. et al. (1994) Robbins Pathologic Basis of Disease . W.B. Saunders Co., 
5 Philadelphia PA, p.866). Peroxisomal beta-oxidation is impaired in cancerous tissue. Although 
neoplastic human breast qsithelial cells have the same number of peroxisomes as do nonnal cells, 
fatty acyl-CoA oxidase activity is lower than in control tissue (el Bouhtoury, F. et al. (1992) J. Pathol. 
166:27-35). Human colon carcinomas have fewer peroxisomes than normal colon tissue and have 
lower fatty-acyl-CoA oxidase and biftinctional enzyme (including enoyl-CoA hydratase) activities 

10 than normal tissue (Cable, S. et al. (1992) Virchows Arch. B Cell Pathol. Incl. Mol. Pathol. 62:221- 
226). AnothCT important oxidoreductase is isocitrate ddiydrogenase, which catalyzes the conversion 
of isocitrate to a-ketoglutarate, a substrate of the citric add cycle. Isocitrate dehydrogenase can be 
either NAD or NADP dq)endent, and is found in the cytosol. mitochondria, and peroxisomes. 
Activity of isocitrate dehydrogenase is regulated developmentally, and by hormones, 

15 neurotransmitters, and growth factors. 

Hydroxypyruvate reductase (HPR), a peroxisomal 2-hydroxyacid dehydrogenase in the 
glycolate pathway, catalyzes the conversion of hydroxypyruvate to glycerate with the oxidation of 
both NADH and NADPH. The revere ddiydrogenase reaction reduces NAD*^ and NADP\ HPR 
recycles nucleotides and bases back into pathways leading to the synthesis of AlP and GIP. ATP 

20 and GTP are used to produce DNA and RNA and to control various aspects of signal transduction and 
energy metabolism. Inhibitors of purine nucleotide biosynthesis liave long been employed as 
antiproliferative agents to treat cancer and viral diseases. HPR also regulates biochemical synthesis 
of senne and cellular serine levels available for protein synthesis. 

The mitochondrial electron transport (or respiratory) chain is a series of oxidorcductase-type 

25 enzyme complexes in the mitochondrial membrane that is responsible for the transport of electrons 
from NADH through a series of redox centers within these complexes to oxygen, and the coupling of 
this oxidation to the synthesis of ATP (oxidative phosphorylation). AlP then provides the primary 
source of energy for driving a cell's many energy-requiring reactions. The key complexes in the 
respiratory chain are NADH:ubiquinone oxidoreductase (complex I)> succinate:ubiquinone 

30 oxidoreductase (complex II), cytochrome c,-b oxidoreductase (complex III), cytochrome c oxidase 
(complex IV), and ATP synthase (complex V) (Alberts, B. et al. (1994) Molecular Biology of the 
Cell . Garland Publishing, Inc., New York NY, pp. 677-678). All of these complexes are located on 
the inner matrix side of the mitochondrial membrane except complex II, which is on the cytosolic 
side. Complex II transports electrons generated in the citric acid cycle to the respiratory chain. The 

3 5 electrons generated by oxidation of succinate to f umarate in the citric acid cycle are transfenred 
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through electron carriers in complex U to membrane bound ubiquinone (Q). Transcriptional 
regulation of these nuclear-encoded genes appears to be the predominant means for controlling the 
biogenesis of respiratory enzymes. Defects and altered expression of enzymes in the respiratory chain 
are associated with a variety of disease conditions. 
5 Other dehydrogenase activities using NAD as a cofactor are also important in mitochondrial 

function. 3-hydiDxyisobutyrate dehydrogenase (3HBD), important in valine catabolism, catalyzes the 
NAD-dependent oxidation of 3-hydroxyisobutyrate to methylmalonate semialdehyde within 
mitochondria. Elevated levels of S-hydroxyisobutyrate have been reported in a number of disease 
states, including ketoacidosis, methylmalonic acidemia, and other disorders associated with 

10 deficiencies in methylmalonate semialdehyde dehydrogenase (Rougraff, P.M. et al. (1989) J. BioL 
Chem. 264:5899-5903). 

Another mitochondrial dehydrogenase important in amino acid metabolism is the enzyme 
isovaleryl-CoA-dehydrogenase (I VD). IVD is involved in leucine metabolism and catalyzes the 
oxidation of isovaleryl-CoA to 3-methylcrotonyl-CoA. Human IVD is a tetrameric flavoprotein that 

15 is encoded in the nucleus and synthesized in the cytosol as a 45 kDa precursor with a mitochondrial 
import signal sequence. A genetic deficiency, caused by a mutation in the gene encoding IVD, results 
in the condition known as isovaleric acidemia. This mutation results in inefficient mitochondrial 
import and processing of the IVD precursor (Vockley, J. et al. (1992) J. Biol. Chem. 267:2494-2501). 
Transferases 

20 Transferases are enzymes that catalyze the transfer of molecular groups. Hie reaction may 

involve an oxidation, reduction* or cleavage of covalent bonds, and is often specific to a substrate or 
to particular sites on a type of substrate. Transferases participate in reactions essential to such 
functions as synthesis and degradation of cell components, regulation of cell functions including cell 
signaling, cell proliferation, inflamation, ip>ptosis, secretion and excretion. Transferases are 

25 involved in key steps in disease processes involving these functions. Transferases are fi'equently 
classified according to the type of group transferred. For example, methyl transferases transfer one- 
carbon methyl groups, amino transferases transf^ nitrogenous amino groups, and similarly 
denominated enzymes transfer aldehyde or ketone, acyl, glycosyl, alkyl or aryl, isoprenyl, saccharyl, 
phosphorous-containing, sulfur-containing, or selenium-containing groups, as well as small 

3 0 enzymatic groups such as Coenzyme A. . 

Acyl transferases include peroxisomal carnitine octanoyl transferase, which is involved in the 
fatty acid beta-oxidation pathway, and mitochondrial carnitine palmitoyl transferases, involved in 
fatty acid metabolism and transport Choline 0-acctyl transferase catalyzes the biosynthesis of the 
neurotransmitter acetylcholine. 

3 5 Amino transferases play key roles in protein synthesis and degradation, and they contribute to 
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Other piocesses as wdl. For sample, the amino transferase 5-aminoIevulinic add synthase catalyzes 
the addition of sucdnyl-CoA to glycine, the first step in heme biosynthesis. Other amino transferases 
participate in pathways important for neurological function and metabolism. For example, glutaminc- 
phenylpyruvate amino transferase, also known as glutamine transaminase K (GTIQ, catalyzes several 
reactions with a pyridoxal phosphate cofactor. GTK catalyzes the reversible conversion of L- 
glutamine and phenylpyruvate to 2-oxoglutaramate and L-phenylalanine. Other amino acid substrates 
for GTK include L-methlonine, L-histidine, and L-tyrosine, GTK also catalyzes the conversion of 
kynurenine to kynurenic acid, a tryptophan metabolite that is an antagonist of the N-methyl-D- 
aspartate (NMDA) recq)tor in tiie brain and may exert a neuromodulatory function. Alteration of die 
kynurenine metabolic pathway may be associated witii several neurological disoixlers. GTK also 
plays a role in tiie metabolism of halogenated xenobiotics conjugated to glutathione, leading to 
nq?hrotoxicity in rats and neurotoxicity in humans. GTK is expressed in kidney, liver, and brain. 
Both human and rat GTKs contain a putative pyridoxal phosphate binding site (ExPASy ENZYME: 
EC 2.6.1.64; Perry, S.J. et al. (1993) Mol. Pharmacol. 43:660-665; Peny, S. et al. (1995) FEES Lett. 
360:277-280; and Alberati-Giani, D. et al. (1995) J. Neurochem. 64:1448-1455). A second amino 
transferase associated with this patiiway is kynurenine/a-aminoadipate amino transferase (AadAT). 
AadAT catalyzes the reversible conversion of a-aminoadipate and a-ketoglutarate to a-ketoadipate 
and L-glutamate during lysine metaboUsm. AadAT also catalyzes die transamination of kynurenine 
to kynurenic acid. A cytosolic AadAT is expressed in rat kidney, liver, and brain (Nakatani, Y. et al. 
(1970) Biochim. Biophys. Acto 198:219-228; Budili, R. et al. (1995) J. Biol. Chem. 270:29330- 
29335). 

Glycosyl transferases include the mammalian UDP-glucouronosyl transferases, a family of 
membrane-bound microsomal enzymes catalyzing the transfer of glucouronic acid to lipophilic 
substrates in reactions that play important roles in detoxification and excretion of drugs, carcinogens, 
and otiier foreign substances. Another mammalian glycosyl transfoase, mammalian UDP-galaaose- 
ceramlde galactosyl transferase, catalyzes the transfer of galactose to ceramide in the synthesis of 
galactocerebrosides in myelin membranes of die nervous system. The UDP-glycosyl transferases 
share a conserved signature domain of about 50 amino add residues (PROSITE: PDC)C00359, 
http7/expasy.hcuge.ch/sprot^rosite.html). 

Methyl transferases are involved in a variety of pharmacologically important processes. 
Nicotinamide N-metiiyl transferase catalyzes die N-metiiylation of nicotinamides and other pyridines, 
an important slep in the cellular handling of drugs and other foreign compounds. 
Phenyletiianolamine N-mediyl transferase catalyzes the conversion of noradrenalin to adrenalin. 6-0- 
methylguanine-DNA methyl transferase reverses DNA methylation, an important step in 
carcinogenesis. Uroporphyrin-ID C-metiiyl transferase, which catalyzes the transfer of two metiiyl 
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groups from S-adenosyl-L-metfaionine to uroporphyrinogen ni, is the first specific enzyme in the 
biosynthesis of cobalamin, a dietary enzyme whose uptake is deficient in pernicious anemia. Protein- 
arginine methyl transferases catalyze the posttranslational methylation of arginine residues in 
proteins, resulting in the mono- and dimethylation of arginine on the guanidino group. Substrates 
5 include histones. myelin basic protein, and heterogeneous nuclear ribonucleoproteins involved in 
mRNA processing, splicing, and transport. Protein-arginine methyl transferase int^acts with proteins 
upregulated by mitogens, with proteins involved in chronic lymphocytic leukemia, and with 
interferon, suggesting an important role for methylation in cytokine receptor signaling (Lin, W.-J. et 
al. (1996) J. Biol. Chem. 271:15034-15044; Abramovich, C. et al. (1997) EMBO J. 16:260-266; and 

10 Scott. H.S. et al. (1998) Genomics 48:330-340). 

Phosphotransferases catalyze the transfer of high-energy phosphate groups and are important 
in energy-requiring and -releasing reactions. The metabolic enzyme creatine kinase catalyzes the 
reversible phosphate transfer between creatine/aeatine phosphate and ATP/ADP. Glycocyamine 
kinase catalyzes phosphate transfo* from ATP to guanidoacetate, and arginine kinase catalyzes 

1 5 phosphate transfer fit)m ATP to argenine. A cysteine-containing active site is conserved in this 
family (PROSITE: PDOC00103). 

Prenyl transferases are heterodimers, consistbg of an alpha and a beta subunit, that catalyze 
the transfer of an isoprenyl group. An example of a prenyl transferase is the mammalian protein 
famesyl transferase. Hie alpha subunit of faniesyl transferase consists of 5 repeats of 34 amino acids 

2 0 each, mth each repeat containing an invariant tryptophan (PROSITE: PDOC00703). 

Sacdiaiyl transferases are glycating enzymes involved in a variety of metabolic processes. 
Oligosacchryl transferase-48, for example, is a receptor for advanced glycation endproducts. 
Accumulation of these endproducts is observed in vascular complications of diabetes, macrovascular 
disease, renal insufficiency, and Alzhdmer's disease (Hiomalley, P.J. (1998) Cell Mol. Biol. (Noisy- 

25 Le-Grand) 44:1013- 1023). 

Coenzyme A (CoA) transferase catalyzes the transfer of CoA between two carboxylic acids. 
Succinyl CoA:3-oxoacid CoA transferase, for example, transfers CoA from succinyl-CoA to a 
recipient such as acetoacetate. Acetoacetate is essential to the metabolism of ketone tKXiies, which 
accumulate in tissues affected by metabolic disorders such as diabetes (PROSITE: PDOC00980). 

30 Hvdrolases 

Hydrolysis is the breaking of a covalent bond in a subsU'ate by introduction of a molecule of 
water. The reaction involves a nucleophilic attack by the water molecule's oxygen atom on a target 
bond in the substrate. Hie water molecule is split across the target t)ond, breaking the bond and 
generating two product molecules. Hydrolases participate in reactions essential to such functions as 
35 synthesis and degradation of cell components, and for regulation of cell functions including cell 
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signaling* cell proliferation, inflamation, qx)ptosis, secretion and excretion. Hydrolases are involved 
in key steps in disease processes involving these functions. Hydrolytic enzymes, or hydrolases^ may 
be grouped by substrate specificity into classes including phosphatases, peptidases, 
lysophospholipases, phosphodiesto^ses, glycosidases, and glyoxalases. 

Phosphatases hydrolytically remove phosphate groups from proteins, an enagy-providing 
step that regulates many cellular processes, including intracellular signaling pathways that in turn 
control cell growth and differentiation, cell-cell contact, the cell cycle, and oncogenesis. 

Lysophospholipases (LPLs) regulate intracellular lipids by catalyzing the hydrolysis of ester 
bonds to remove an acyl group, a key step in lipid degradation. Small LPL isoforms, approximately 
15-30 kD, function as hydrolases; larg^ isoforms function both as hydrolases and transacylases. A 
particular substrate for LPLs, lysophosphatidylcholine, causes lysis of cell membranes. LPL activity 
is regulated by signaling molecules important in numerous pathways, including the inflammato!^' 
response. 

Peptidases, also called proteases, cleave peptide bonds tiiat form the backbone of peptide or 
protein chains. Proteolytic processing is essential to cell growtii, differentiation, remodeling, and 
homeostasis as well as inflammation and immune response. Since typical protein halMives range 
from hours to a few days, peptidases are continually cleaving precursor proteins lo their active form, 
removing signal sequences from targeted proteins* and degrading aged or defective proteins. 
Pq)tidases function in bacterial, parasitic, and viral invasion and rq)lication witiiin a host. Examples 
of pqitidases include trypsin and chymotrypsin (components of the complement cascade and the 
blood-clotting cascade) lysosomal catiiepsins, calpains. pepsin, renin, and chymosin (Beynon, R.J, 
and J.S. Bond (1994) Proteolvtic Enzvmes: A Practical Approach, Oxford University Press, New 
York NY, pp. 1-5). 

Ihe phosphodiesterases catalyze the hydrolysis of one of the two ester bonds in a 
phosphodiester compound. Phosphodiesterases are therefore crucial to a variety of cellular processes. 
Phosphodiesterases include DNA and RNA endo- and exo-nucleases, which are essential to cell 
growtii and rq)lication as wdl as protein synthesis. Another phosphodiesterase is acid 
sphingomyelinase, which hydrolyzes tiie membrane phospholipid sphingomyelin to ceramide and 
phosphorylcholine. Phosphorylcholine is used in tiie synthesis of phosphatidylcholine, which is 
involved in numerous intracellular signaling patiiways. Ceramide is an essential precursor for tiie 
generation of gangliosides, membrane lipids found in high concentration in neural tissue. Defective 
acid sphingomyelinase phosphodiesterase leads to a build-up of sphingomyeUn molecules in 
lysosomes, resulting in Niemann-Pick disease. 

Glycosidases catalyze tiie cleavage of hemiacetyl bonds of glycosides, which are compounds 
tiiat contain one or more sugar. Mammalian lactase-phlorizin hydrolase, for example, is an intestinal 
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enzyme that splits lactose. Mammalian beta-galactosidase removes the terminal galactose from 
gangliosides. glycoproteins, and glycosaminoglycans, and deficiency of this enzyme is associated 
with a gangliosidosis known as Morquio disease type B. Vertebrate lysosomal alpha-glucosidase. 
which hydrolyzes glycogen, maltose, and isomaltose, and vertebrate intestinal suaase-isoraaltase, 
which hydrolyzes sucrose, maltose, and isomaltose, are widely distributed members of this family 
with highly conserved sequences at their active sites. 

TTie glyoxylase system is involved in gluconeogenesis, the production of glucose from 
storage compounds in the body. It consists of glyoxylase I, which catalyzes the formation of S-D- 
lactoylglutathione from methyglyoxal, a side product of triose-phosphate energy metabolism, and 
glyoxylase II, which hydrolyzes S-D-lactoylglutathione to D-lactic acid and reduced glutathione. 
Glyoxylases are involved in hyperglycemia, non-insulin-dq)endent diabetes mellitus, the 
detoxification of bacterial toxins, and in the control of cell proliferation and microtubule ass^bly. 
Lyases 

Lyases are a class of enzymes that catalyze the cleavage of C-C» C-0, C-N, C-S, C-(halide), 
P-0 or other bonds without hydrolysis or oxidation to form two molecules, at least one of whidi 
contains a double bond (Stryer, L. (1995) Biochemistry W.H. Freeman and Co. New York, NY 
p.620). Lyases are critical components of cellular biochemistry with roles in metabolic energy 
production including fatty acid metabolism, as well as other diverse enzymatic jnocesses. Further 
classification of lyases reflects the type of bond cleaved as well as the nature of the cleaved group. 

Hie group of C-C lyases include carboxyMyases (decarboxylases), aldehyde-lyases 
(aldolases), oxo-add-Iyases and others. Ihe C-0 lyase group includes hydro-lyases, lyases acting on 
polysaccharides and oth^ lyases. The C-N lyase group includes ammonia-lyases, amidJne-lyases, 
amine-lyases (deaminases) and other lyases. 

Proper regulation of lyases is critical to normal physiology. For cxampit^ mutation induced 
deficiencies in the uroporphyrinogen decarboxylase can lead to photosensitive cutaneous lesions in 
the genetically-linked disorder familial porphyria cutanea tarda (Mendez. M. et al. (1998) Am. J. 
Genet 63:1363-1375). It has also been shown that adenosine deaminase (ADA) deficiency stems 
from genetic mutations in the ADA gene, resulting in the disorder severe combined 
immunodeficiency disease (SCID) (Hershfield. M.S. (1998) Semin. Hematol. 35:291-298). 
Isom^ases 

IsomCTases are a class of enzymes that catalyze geometric or structural changes within a 
molecule to form a single product. This class includes racemases and q)imerases, cis-trans- 
isomerases, intramolecular oxidoreductases, intramolecular transferases (mutases) and intramolecular 
lyases. Isomerases are critical components of cellular biochemistry with roles in metabolic energy 
production including glycolysis, as well as other diverse enzymatic processes (Stryer, L. (1995) 
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Biochemistry . W.H. Freeman and Co., New York NY. pp.483-507). 

Racemases are a subset of isomerases that catalyze inversion of a molecules configuration 
around the asymmetric caiix>n atom in a substrate having a single center of asymmetry, thereby 
interconverting two racemers. Epimerases are another subset of isom^ases that catalyze inversion of 
5 configuration around an asymmetric carbon atom in a substrate with more than one center of 

symmetry, thereby interconverting two epimers. Racemases and epimerases can act on amino acids 
and derivatives, hydroxy acids and derivatives, as well as carbohydrates and derivatives. The 
interconversion of UDP-galactose and UDP-glucose is catalyzed by UDP-galactose-4'-epimerase. 
Proper regulation and function of this epimerase is essential to the synthesis of glycoproteins and 

10 glycolipids. Elevated blood galactose levels have been correlated with UDP-galactose-4'-q)ima'ase 
deficiency in screening programs of infants (Gitzelmann, R. (1972) Helv. PaediaL Aaa 27:125-130). 

Oxidoreductases can be isomerases as well. Oxidoreductases catalyze the reversible transfer 
of electrons fi-om a substrate that becomes oxidized to a substrate that becomes reduced. Hiis class of 
enzymes includes dehydrogenases, hydroxylases, oxidases, oxygenases, peroxidases, and reductases. 

15 Proper maintenance of oxidoreductase levels is physiologically important. For example, genetically- 
linked deficiencies in lipoamide dehydrogenase can result in lactic acidosis (Robinson, B.H. et al. 
(1977) Pediat. Res. 11:1198-1202). 

Another subgroup of isomerases are the transfin^ses (or mutases). Transferases transfer a 
chemical group from one compound (the donor) to anottier compound (the accq)tor). Tlie types of 

20 groups transferred by these enzymes include acyl groups, amino groups, phosphate groups 

(phosphotransfCTases or phosphomutases), and othm. Hie transferase carnitine palmitoyltransferase 
is an important component of fatty acid metabolism. Genetically-linked deficiencies in this 
transferase can lead to myopathy (Solver, C.R. &i al. (1995) Hie Metabolic and Molecular Basis of 
Inherited Disease . McGraw-Hill. New Yoric NY, pp.l50M533). 

2 5 Yet another subgroup of isomerases are the topoisomersases. Topoisommses are enzymes 

that affect the topological state of DN A. For ©cample, defects in topoisomerases or their regulation 
can affect normal physiology. Reduced levels of topoisomerase II have been correlated with some of 
the DNA processing defects associated with the disordo* ataxia-telangiectasia (Singh, S.P. et al. 
(1988) Nucleic Adds Res. 16:3919-3929). 
30 Lipases 

Ligases catalyze the formation of a bond between two substrate molecules. The process 
involves the hydrolysis of a pyrophosphate bond in ATP or a similar energy donor. Ligases are 
classified based on the nature of the type of bond they form, which can include carbon-oxygen, 
carbon-sulfur, carbon-nitrogen, carbon-carbon and phosphoric ester bonds. 

3 5 Ligases forming carbon-oxygen bonds include the aminoacyl-transfer RN A (tRN A) 
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synthetases which are important RNA-associated enzymes with roles in translation. Ptotein 
biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The 
aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino 
acid with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two 
5 structural classes, and each class is characterized by a distinctive topology of the catalytic domain. 
Class I enzymes contain a catalytic domain based on the nucleotide-binding Rossman fold. Class n 
enzymes contain a central catalytic domain, which consists of a seven-stranded antiparallel 6-sheet 
motif, as well as N- and C- terminal replatory domains. Class II enzymes are separated into two 
groups based on the heterodimeric or homodimeric structure of the enzyme; tiie latter group is further 
10 subdivided by the structure of the N- and C-terminal regulatory domains (Hartiein, M. and S, Cusack 
(1995) J. Mol. Evol. 40:519-530). Autoantibodies against aminoacyl-tRNAs are generated by 
patients with dermatomyositis and polymyositis, and correlate strongly witii complicating interstitial 
lung disease (ILD). lliese antibodies appear to be generated in response to viral infection, and 
coxsackie virus has been used to induce eKperimental viral myositis in animals. 
15 Ligases forming carbon-sulfur bonds (Add-tiiiol ligases) mediate a large number of cellular 

biosynthetic int^mediaiy metabolism processes involve int^molecular transfer of caitx>n 
atom-containing substrates (carbon substrates). Examples of such reactions include the tricarboxylic 
acid cycle, syntiiesis of fatty adds and long-chain phospholipids, syntiiesis of alcohols and aldehydes, 
syntiiesis of intermediary metabolites, and reactions involved in tiie amino acid degradation 
2 0 pathways. Some of these reactions require input of eno-gy, usually in the form of conversion of ATP 
to either ADP or AMP and pyrophosphate. 

In many cases, a carbon substrate is derived from a smaU molecule containing at least two 
carbon atoms. The cari)on substrate is often oovalentiy botmd to a larger molecule which acts as a 
carbon substrate carrier molecule within the cell. In the biosynthetic mechanisms described above, 
25 the carrier molecule is coenzyme A. Coenzyme A (CoA) is structurally rdated to derivatives of the 
nucleotide ADP and consists of 4'-phosphopantetiieine linked via a phosphodiestar bond to the alpha 
phosphate group of adenosine 3\5'-bisphosphate. The terminal tiiiol group of 4'-phosphopantetheine 
acts as the site for carbon substrate bond formation. The predominant carbon substrates which utilize 
CoA as a carrier molecule during biosyntiiesis and intermediary metabolism in tiie cell are acetyl, 
30 succinyl, and propionyl moieties, collectively referred to as acyl groups. Otiier cartwn substrates 
include enoyl lipid, which acts as a fatty acid oxidation intermediate, and carnitine, which acts as an 
acetyl-CoA flux regulator/ mitochondrial acyl group transfer protein. Acyl-CoA and acetyl-CoA are 
synthesized in tiie cell by acyl-CoA syntiietase and acelyl-CoA syntiietase, respectively. 

Activation of fatty acids is mediated by at least three forms of acyl-CoA synthetase activity: 
35 i) acetyl-CoA synthetase, which activates acetate and several other low molecular weight carboxylic 
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acids and is found in muscle mitochondria and tlie cytosol of ottier tissues; ii) medium-chain 
acyl-CoA synthetase, which activates fatty acids containing between four and eleven caihon atoms 
(predominantly from dietary sources)^ and is present only in liver mitochondria; and iii) acyl CoA 
synthetase, which is specific for long chain fatty acids with between six and twenty carbon atoms, and 
5 is found in microsomes and the mitochondria. Proteins associated with acyl-CoA synthetase activity 
have been identified from many sources including bacteria, yeast, plants, mouse, and man. Hie 
activity of acyl-CoA synthetase may be modulated by phosphorylation of the enzyme by 
cAMP-dependent protein kinase. 

Ligases forming carbon-nitrogen bonds include amide synthases such as glutamine synthetase 
10 (glutamate-ammonia ligase) that catalyzes the amination of glutamic acid to glutamine by ammonia 

r 

using the energy of ATP hydrolysis. Glutamine is the primary source for the amino group in various 
amide transfer reactions involved in de novo pyrimidine nucleotide synthesis and in purine and 
pyrimidine ribonucleotide interconversions. Overexpression of glutamine synthetase has been 
observed in primary Uver cancer (Christa, L. et al. (1994) GastroenL 106:1312-1320). 

1 5 Add-amino-acid ligases (peptide synthases) are represented by the ubiquitin proteases which 

are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of 
cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of 
abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular 
processes such as gene transcription and cell cycle progression. In the UCS pathway, proteins 

2 0 targeted for degradation are conjugated to a ubiquitin (Ub), a small heat stable protein. Ub is first 
activated by a ubiquitin-activating enzyme (El), and then transfmed to one of several Ub- 
conjugating enzymes (E2). E2 then links the Ub molecule through its C-terminal glycine to an 
internal lysine (acceptor lysine) of a targ^ protein. The ubiquitinated protein is then recognized and 
degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released 

25 for reutilization by ubiquitin protease. The UCS is implicated m the degradation of mitotic cyclic 
kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors 
associated with signal transduction, transcriptional regulators, and mutated or damaged proteins 
(CiechanovCT, A. (1994) Cdl 79:13-21). A murine proto-oncogene, Unp. encodes a nuclear ubiquitin 
protease whose overexpression leads to oncogenic transformation of NIH3T3 cells, and the human 

30 homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung 
(Gray. D.A. (1995) Oncogene 10:2179-2183). 

Cyclo-ligases and other carbon-nitrogen ligases comprise various enzymes and enzyme 
complexes that participate in the de novo pathways to purine and pyrimidine biosynthesis. Because 
these pathways are critical to the synthesis of nucleotides for replication of both RN A and DN A, 

3 5 many of these enzymes have been the targets of clinical agents for the treatment of cell proliferative 
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disorders sucb as cancer and infectious diseases. 

Purine biosynthesis occurs de novo from the amino acids glycine and glutamine, and other 
small molecules. Three of the key reactions in this process are catalyzed by a trifimctional enzyme 
composed of glycinamide-ribonucleotide synthetase (GARS)» aminoimidazole ribonucleotide 
5 synthetase (AIRS), and glycinamide ribonucleotide transformylase (GART). Together these three 
enzymes combine ribosylamine phosphate with glycine to yield phosphoribosyl aminoimidazole, a 
precursor to both adenylate and guanylate nucleotides. This trifunctional protein has been implicated 
in the pathology of Downs syndrome (Aimi, J. et al. (1990) Nucleic Acid Res, 18:6665-6672). 
Adenylosuccinate synthetase catalyzes a later sXep in purine biosynthesis that converts inosinic acid to 

10 adenylosuccinate, a key step on the path to ATP synthesis. This enzyme is also similar to another 
caibon-nitrogen ligase, argininosuccinate synthetase, that catalyzes a similar reaction in the urea cycle 
(Powell. S.M. et al. (1992) FEES Lett 303:4-10). 

Like the de novo biosynthesis of purines, de novo synthesis of the pyrimidine nucleotides 
uridylate and cytidylate also arises from a common precursor, in this instance the nucleotide 

1 5 orotidylate dmved from orotate and phosphoribosyl pyrophosphate (PPRP). Again a trifunctional 
enzyme comprising three carbon-nitrogen ligases plays a key role in the process. In this case the 
enzymes aspartate transcarbamylase (ATCase), carbamyl phosphate synthetase II, and dihydroorotase 
(DHOase) are encoded by a single gene called CAD. Together these three enzymes combine the 
initial reactants in pyrimidine biosynthesis, glutamine, COj. and ATP to fonn dihydroorotate, the 

20 precursor to orotate and orotidylate (Iwahana, H. et al. (1996) Biochem. Biophys. Res. Commun. 
219:249-255). Further steps then lead to the synthesis of uridine nucleotides from orotidylate. 
Cyddine nucleotides are derived from uridine-5'-triphosphate (UTP) by the amidation of UTP using 
glutamine as the amino donor and the enzyme CTP synthetase. Regulatory mutations in the human 
CTP synthetase are believed to confer multi-drug resistance to agents widely used in cancer ther^y 

25 (Yamauchi, M. el al. (1990) EMBO J. 9:2095-2099). 

Ligases forming carbon-cartx)n bonds include the carboxylases acetyl-CoA c^u-boxylase and 
pyruvate cartx)xylase, Acetyl-CoA caiboxylase catalyzes the carboxylation of acetyl-CoA from CO2 
and HjO using the en^gy of ATP hydrolysis, Acetyl-CoA carboxylase is the rate-limiting step in the 
biogenesis of long-chain fatty acids. Two isoforms of acetyl-CoA carboxylase, types I and types II, 

30 are expressed in human in a tissue-specific manner (Ha, J. et al. (1994) Eur. J. Biochem. 219:297- 
306). Pyruvate carboxylase is a nuclear-encoded mitochondrial enzyme that catalyzes the conversion 
of pyruvate to oxaloacetate, a key intermediate in the citric acid cycle. 

Ligases forming phosphoric ester bonds include the DNA ligases involved in both DNA 
replication and repair. DNA ligases seal phosphodiester bonds between two adjacent nucleotides in a 

3 5 DNA chain using the energy from ATP hydrolysis to first activate the free 5 ' -phosphate of one 
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nucleotide and then react it with the 3'*0H group of the adjacent nucleotide. This lesealing reaction 
is used in both DNA replication to join small DN A fragments called Okazaki fragments that are 
transiently formed in the process of replicating new DNA, and in DNA repair. DNA rq)air is the 
process by which accidental base changes* such as those produced by oxidative damage, hydrolytic 
5 attack, or uncontrolled methylation of DNA, are corrected before replication or transcription of the 
DNA can occur. Bloom's syndrome is an inherited human disease in which individuals are partially 
deficient in DNA ligation and consequently have an increased incidence of cancer (Alb^, B. et al. 
(1994) The Molecular Biology of the Cell. Garland Publishing Inc.. New York NY, p. 247). 

10 Molecules Associated with Growth and Development 

SEQ ID N0:51 and SEQ ID NO:52 encode, for example, molecules associated with growth 
and development 

Human growth and development requires the spatial and temporal regulation of cell 
differentiation, cell proliferation, and £^ptosis. These processes coordinately control reproduction, 

15 aging, embryogenesis, morphogenesis, organogenesis, and tissue repair and maintenance. At the 
cellular level, growth and development is governed by the cell's decision to enter into or exit from the 
cell division cycle and by the cdl's commitment to a terminally differentiated state. These decisions 
are made by the cell in response to extracellular signals and other environmental cues it receives. The 
following discussion focuses on the molecular mechanisms of cell division, reproduction, cell 

20 differentiation anil prolif&ation, q)optosis, and aging. 
Cell Division 

Cdl division is the fiindamaital process by which all living things grow and reproduce. In 
unicellular organisms such as yeast and bacteria, each cell division doubles the number of organisms, 
while in multicellular species many rounds of cell division are required to rqplace cells lost by wear or 

25 by programmed cdl death, and for cell differentiation to produce a new tissue or organ. Details of the 
cell division cycle may vary, but the basic process consists of three principle evoits. The first event, 
interphase, involves preparations for cell division, replication of the DNA, and production of ess^tial 
proteins. In the second event, mitosis, the nuclear material is divided and separates to opposite sides of 
the cdl. The final event, cytokinesis, is division and fission of the cell cytoplasm. The sequence and 

3 0 timing of cell cycle transitions is under the control of the cell cycle replation system whidi controls the 
process by positive or negative regulatory circuits at various check points. 

Regulated progression of the cell cycle dq)ends on the int^ation of growth control pathways 
with the basic cdl cycle madiinery. Cdl cycle regulators have been identified by sdecting for human 
and yeast cDNAs that block or activate cdl cycle arrest signals in the yeast mating pheromone pathway 

a 5 when they are overjexpressed. Known regulators include human CPR (cdl cycle progression 
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restoration) goies. such as CPR8 and CPR2, and yeast CDC (cell division control) genes, including 
CDC91 , that block the arrest signals. The CPR genes express a variety of proteins including cydins, 
tumor suppressor binding proteins, chaperones, transcription factors* translation factors* and 
RNA-bindingprotdns (Edwards, M.C. et al.(1997) Genetics 147:1063-1076). 

Several cdl cycle transitions, including the entry and exit of a cell from mitosis, are dq)endait 
upon the activation and inhibition of cyclin-dqiendent kinases (Cdks). The Cdks are composed of a 
kinase subunit, Cdk, and an activating subunit, cyclin, in a complex that is subject to many levds of 
regulation. There-appears to be a single Cdk in Saccharomvces cerevisiae and Saccharomvces pombe 
wtereas mammals have a variety of specialized Cdks. Cyclins act by binding to and activating 
cyclin-dq>endcnt protein kinases which then phosphorylate and activate selected protems involved in the 
mitotic process. The Cdk-cyclin conq)lex is both positivdy and negativdy regulated by 
phosphorylation, and by targeted degradation involving molecules such as CDC4 and CDC53. In 
addition, Cdks are furtha- regulated by binding to inhibitors and other proteins such as Sucl that 
HKxlify thdr specifidty or accessibility to regulators (Patra, D. and W.G. Dunphy (1996) Genes Dev. 
10:1503-1515; and Mathias. N. et al, (1996) Mol. Cell Biol. 16:6634-6643). 
Reproduction 

The male and female rq)roductive systems are conqjlex and involve many aspects of growth 
and devdq)ment. The anatomy and physiology of the male and female rq)roductive systems are 
reviewed in (Guyton. AC. (1991) Textbook of Medical Phvsiolocv . W.B. Saunders Co., Philadelphia 
PA. pp. 899-928). 

The male reproductive syst^ includes the process of spermatogoiesis, in whidi the sperm are 
formed, and male rq)roductive iunctions are regulated by various hormones and thdr effects on 
accessory sexual ^gans, cdlular metabolism, growth, and other bodily functions. 

Spermatogenesis begins at puberty as a result of stimulation by gonadotropic hormones 
rdeased from the anterior pituitary. Immature sperm (spermatogonia) undergo several mitotic cell 
divisions before undergoing mdosis and fiill maturatioa The testes secrete several male sex hormones, 
the most abundant bdng testosterone, that is essential for growth and division of the immature sperm, 
and for the masculine characteristics of the male body. Three other male sex hormones, gonadotrqpin- 
rdeasing hormone (GnRH), luteinizmg hormone (LH), and follicle-stimulating hrnnone (FSH) control 
sexual fruiction. 

The uterus, ovaries, fallc^ian tubes, vagina, and breasts comprise the female rqsroductive 
system. The ovaries and uterus are the source of ova and the location of fetal devdopment, 
respectively. The fallopian tubes and vagina are access^y organs attached to the tq) and bottom of the 
uterus, respectivdy. Both the uterus and ovaries have additional roles in the devdopment and loss of 
rqiroductive capability during a finale's lifetime. Tlie primary role of the breasts is lactation. 
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Multiple endocrine signals from the ovaries, uterus, pituitary, hypothalamus, adrenal glands, and other 
tissues coordinate rqn-oduction and lactatioa These signals vary during the monthly menstruation 
cycle and during the female's lifetime. Similarly, the soisitivity of rq)roductive organs to these 
aulocrine signals varies during the female's lifetime. 
5 A combination of positive and negative feedback to the ovaries, pituitary and hypothalamus 

glands controls physiologic changes during the monthly ovulation and endometrial cycles. The anterior 
pituitary seaaes two major gonadotrq)in honnones, follide-stimulating hormone (FSH) and luteinizing 
hormone (LH). regulated by negative feedback of steroids, most notably by ovarian estradiol. If 
fertilization does not occur, estrogra and progesterone levels decrease. This sudden reduction of the 

1 0 ovarian hormones leads to m^truation, the desquamation of the endom^um. 

Hormones furtiier govan all the stq^s of pregnancy, parturition, lactation, and mencpause. 
During pregnancy large quantities of human chorionic gonadotropin (hCG), estrogens, progesterone, 
and human chorionic somatomanunotropin QiCS) are formed by tiie placenta. hCG, a glycoprotein 
similar to luteinizing hormone, stimulates the corpus luteum to continue producing more progesterone 

15 and estrogens, raUiertiian to involute as occurs if the ovum is not fertilized. hCS is similar to growth 
hormone and is crucial for fetal nutritioa 

The female breast also matures during pregnancy. Large amounts of estrogen secreted by the 
placenta trigger growth and branching of the breast milk ductal system while lactation is initiated by the 
secretion of prolactin by the pituitary gland. 

20 Parturition involves several hornional changes that increase uterine contractility toward the eniA 

of pregnancy, as follows. The levels of estrogens increase more than those of progesterone. Oxytocin 
is seaeted by the neurohypqjhysis. Concomitantiy, uterine sensitivity to oxytocin increases. The fetus 
itself secretes oxytocin, Cortisol (from adrraal glands), and prostaglandins. 

Menopause occurs when most of the ovarian follicles have degenerated. The ovary then 

25 produces less estradiol, reducing the negative feedback on the pituitary and hypothalamus glands. 
Mean levds of circulating FSH and LH increase, even as ovulatory cycles continue. Theref(»'e, the 
ovary is less responsive to gonadotropins, and there is an increase in the time between menstrual cycles. 
Consequentiy, menstrual bleeding ceases and rq)roductive capability ends. 
Cell Diffarentiation and Prolifg-ation 

3 0 Tissue growtii involves complex and ordered patterns of cell proliferation, cell differentiation, 

and apoptosis. Cell proliferation must be regulated to maintain botii the number of cells and their 
spatial organization. This regulation depends upon the appropriate expression of proteins which control 
cdl cycle progression in response to extracellular signals, such as growtii faaors and other mitogens, 
and intracdlular cues, such as DNA damage or nutrient starvatioa Molecules which direcUy or 

3 5 indirectiy modulate cell cycle progression fall into several cat^ories, including growtii faaors and their 
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recq>tors, second messenger and signal transduction proteins, oncogene products, tumor-suppressor 
proteins, and mitosis-promoting factors. 

Growth factors were originally described as serum factors required to promote cell 
proliferatioa Most growth factors are large, secreted polypeptides that act on cdls in their local 
5 environment. Growth factors bind to and activate specific cell surface recq)tQrs and initiate 
intracellular signal transduction cascades. Many growth factor receptors are classified as receptor 
tyrosine kinases whidi undergo autophosphorylation upon ligand binding. Autophosphorylation 
enables the recqrtor to interact with signal transduction protdns charactaized by the presence of SH2 
or SH3 domains (Src homology regions 2 or 3). These proteins then modulate the activity state of 
10 small G-protdns, such as Ras, Rab. and Rho, along with GTPase activating proteins (GAPs), guanine 
nucleotide rdeasing proteins (GNRPs), and other guanine nucleotide exchange factors. Small G 
proteins act as nwlecular switches that activate other downstream events, such as mitogen-activated 
protdn kinase (MAP kinase) cascades. MAP kinases ultimately activate transcription of mitosis- 
promoting genes. 

15 In addition to growth factors, small signaling pqjtides and hormones also influwice cdl 

proliferatioa These molecules bind primarily to another class of recq)tor, the trimeric G-protdn 
coupled recq)tOT (GPCR), found predominantly on the surface of immune, neuronal and neuroendocrine 
cells. Upon ligand binding, the GPCR activates a trimeric G protein which in turn triggers increased 
levels of intracellular second messengo-s such as phospholipase C, Ca2+, and cyclic AMP. Most 

2 0 GPCR-mediated signaling pathways indirectly promote cell proliferation by causing the secretion or 

breakdown of otho* signaling molecules that have direct mitogenic dTects. These signaling cascades 
often involve activation of kinases and phosphatases. Some growth factors, sudi as some members of 
the transforming growth factor bm (TGF-P) family, act on some cells to stinmlate cdl proliferation 
and on other cdls to inhibit it Growth factors may also stimulate a cdl at one concoitration and inhibit 
25 the same cdl at another concentration. Most growth factors also have a multitude of other actions 
besides the relation of cdl growth and division: they can control the proliferation, survival, 
differentiation, migration, or function of cells dq)ending on the circumstance. For example, the tumor 
necrosis factor/nerve growth factor (TNF/NGF) family can activate or inhibit cdl death, as well as 
regulate proliferation and differentiatioa The cell response dq)«ids on the type of cdl, its stage of 

3 0 differentiation and transformation status, v^ch surface recq)tors are stimulated, and the types of 

stimuli acUng on the cdl (Smith. A. et al. (1994) Cdl 76:959-962; and Nocemini. G. et al. (1997) Proc. 
Nad. Acad. Sd. USA 94:6216-6221). 

Ndghbonng cdls in a tissue compete for growth factors, and v/ttsn provided with "unlimited" 
quantities In a perfused sysiem will grow to even higher cdl deities before reaching doisity-dqsaident 
3 5 inhibition of cdl divisioa Cdls often demonstrate an anchorage dq}en(toice of cdl division as wdl. 
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This anchorage depend^ce iMy be assodated with the formation of focal contacts Unk^ 
cytoskd^on with the extracellular matrix (ECM). The expression of ECM conqxinents can be 
stimulated by growth factors. For example, TCF-p stimulates fibroblasts to produce a variety of ECM 
proteins, including fibronectin, collagen, and t^iascin (Pearson, C.A. et al. (1988) EMBO J. 7:2677- 
5 2981). In fact» for some cdl types specific ECM molecules, such as laminin or fibronectin, may aa as 
growth faacx-s. Tenascin-C and -R, expressed in developing and lesioned neural tissue, provide 
stimulatory/anti-adbesive or inhibitory properties, respectivdy, for axonal growth (Faissner, A. (1997) 
Cell Tissue Res. 290:331-341). 

Cancers are associated with the activation of oncogenes which are derived from normal cellular 

10 genes. These oncogenes encode oncoprotdns which convert normal cells into nialignant cells. Some 
oncoproteins are mutant isoforms of the nc»rmal protein, and other oncoproteins are abnormally 
expressed with respect to location or amount of expressioa The latta- category of oncoprotdn causes 
cancer by altering transcriptional control of cdl proliferation. Five classes of oncoprotdns are known 
to affect cell cycle controls. These classes include growth faaors, growth faaor receptors, intracellular 

15 signal transducers, nuclear transcription factors, and cell-cycle control protdns. Viral oncogenes are 
integrated into the human genome after infection of human cells by certain viruses. Examples of viral 
oncogenes include v-src, v-abl, and v-fjps. 

Many oncogenes have been idratified and charaaerized. These include sis. erbA, erbB, her-2, 
mutated G„ src, abl, ras, ak, jun» fos. myc, and mutated tumOT-suppressor genes such as RB, p53, 

20 mdm2, Cipl , pi 6, and cyclin D. Transformation of normal genes to oncogenes may also occur by 
diromosomal translocation The Philaddphia chromosome, characteristic of chronic mydoid leukemia 
and a subs^ of acute lymphoblastic loikemias, results from a redprocal translocation bdween 
chromosomes 9 and 22 that moves a truncated portion of the proto-oncogene c-abl to the breakpoint 
cluster region (bar) on chromosome 22. 

2 5 Tumcn'-suppressor g^ies are involved in rq^lating cdl proliferation. Mutations which cause 

reduced or loss of function in tumor-suppressor genes result in uncontrolled cdl proliferation. For 
exanq)le, the rdmoblastoma gme produa (RB), in a non-phosphorylated state, binds several early- 
response genes and suppresses thdr transcription, thus blocking cdl divisioa Phosphorylation of RB 
causes it to dissodate from the g^ies, rdeasing the suppression, and allowing cdl division to proceed. 

30 APODtOSiS 

^pq}tosis is the goiedcally controlled process by which unneeded or ddiective cdls undergo 
programmed cdl death. Sdective elimination of cdls is as inqxxrtant for morphogenesis and tissue 
remodeling as is cdl proliferation and diiferentiatioa Lack of apoptosis may result in hyperplasia and 
other disorders assodated with increased cell proliferatioa Apoptosis is also a critical conqx)nent of 
35 the immune response. Immune cdls such as cytotoxic T-cdls and natural killer cdls prevent the spread 
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Of disease by inducing apoptosis in tumor cells and virus-infected ceils. In addition, immune ceUs that 
fail to distinguish sdf molecules from foreign molecules must be eJiminatert by qx^tosis to avoid an 
autoimmune response. 

y^)optotic cells undergo distinct mnphologica] changes. Hallmarks of apq)tosis include cell 
5 shrinkage, nuclear and cytoplasmic cond^ation, and alterations in plasma membrane topology. 
Biodiemically, apoptotic cells are characterized by increased intracellular calcium concentration, 
fragmentation of diromosomal DNA, and expression of novel cell surface components. 

The molecular mechanisms of apc^tosis are highly conserved, and many of the key protein 
regulators and effeaors of apoptosis have been identified. i^Kjptosis generally proceeds in response to 
10 a signal which is transduced intracellularly and results in altered patterns of gene expression and protein 
activity. Signaling molecules such as hormones and cytokines are known both to stimulate and to 
inhibit apoptosis through inta-actions with cell surface recq)tors. Transcription factors also play an 
impcHtant role in tl^ onset of apoptosis. A number of downstream effector molecules, particularly 
proteases such as the cysteine proteases called caspases, have been implicated in the degradation of 
1 5 cdlular components and the proteolytic activation of other apoptotic effeaors. 
Apng and Senescence 

Studies of the aging process ot saiescence have shown a number of characteristic cellular and 
molecular changes (Fauci et al. (1998) Harrison^s Principles of Internal Medicine . McGraw-Hill, New 
YcH'k NY, p. 37). These characteristics include increases in chromosome structural abnonnalities, DNA 

2 0 aoss-linking, incidence of single-stranded breaks in DNA, losses m DNA methylation, and degradation 

of tdomere regiCHis. In addition to these DNA changes, post-translational alterations of proteins 
increase Including, deamidation, oxidation, cross-linking, and non^ymatic glycatioa Still further 
molecular changes occur in the mitochondria of aging cells through d^erioration of structure. These 
changes evoitually contribute to decreased function in every organ of the body. 

25 

Biochemical Pathway Molecules 

SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, and SEQ ID NO:50 encode, for example, 
biochemical pathway molecules. 

Biochemical pathways are responsible for regulating metabolism, growth and devdcpment. 

3 0 protein secretion and trafficking, environmental responses, and ecological interactions including 

inmuine response and response to parasites. 
DNA replication 

Deoxyribonucldc acid (DNA), the goietic material, is found in both the nucleus and 
mitochondria of human cells. The bulk of human DNA is nuclear, in tiie form of linear chromosomes, 
3 5 while mitochondrial DNA is circular. DNA rq>lication begins at specific sites called origins of 
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rq)lication. Bidirectional synthesis occurs from the origin via two growing toks that move in opposite 
directions. Replication is s^-conservative, with each daughter duplex containing one old strand and 
its newly synthesized conQ)lementary partner. Proteins involved in DN A rq)lication include DNA 
polymerases. DNAprimase. telomerase, DNA hdicase, tppdsomerases, DNA ligases. rq)lication 
5 factors, and DNA-binding proteins. 
DNA Recombination and Repair 

Cdls are constantly faced with rq)lication errors and environmental assault (such as ultraviolet 
irradiation) that can produce DNA damage. Damage to DNA consists of any change that modifies the 
structure of the molecule. Changes to DNA can be divided into two general classes, single base 

10 changes and structural distortions. Any damage to DNA can produce a mutation, and the mutation may 
produce a disorder, such as cancer. 

Changes in DNA are recognized by repm systems within the cdl These repair systems act to 
COTrect the damage and thus prevent any ddeterious affects of a mutational event. Rq)air systems can 
be divided into three genial types, direa r^air, excision repair, and retrieval systems. Protdns 

15 involved in DNA repair include DNA polymerase, excision repair proteins, excision and cross link 
rqjair proteins, recombination and rq)air protdns, RAD51 proteins, and BLN and WRN proteins that 
are homologs of RecQ hdicase. When the rq)air systems are eliminated, cells become exceedingly 
sensitive to environm^tal mutagwis, such as ultraviolet irradiation. Patients with disorders associated 
with a loss in DNA rqiair systems often exhibit a high s^itivity to environmental mutagens. 

20 Examples of such disorders include xerodmia pigmentosum (XP), Bloom's syndrome (BS), and 
Werner's syndrome (WS) (Yamagata, K. et al. (1998) Proc. Natl. Acad. Sci. USA 95:8733-8738), 
ataxia tdangiectasia, Cockayne's syndrome, andFanconi's anemia. 

Recombination is the process v/tosxdby new DNA sequences are generated by the movements of 
large pieces of DNA In homologous recombination, which occurs during mdosis and DNA repair, 

2 5 par^t DNA duplexes align at regions of sequ»ice similarity, and new DNA molecules form by the 
breakage and joining of homologous segments. Proteins involved include RAD51 recombinase. In site- 
specific recombination, two specific but not necessarily homologous DNA sequaices are exchanged. In 
the inmmne system this process goierates a diverse collection of antibody and T cdl recq)tor g^ies. 
Proteins involved in site-specific recombination in the immune system include recombination activating 

30 genes 1 and 2 (RAGl and RAG2). A defect in immune system site-specific recombination causes 
severe combined immunodeficiency disease in mice. 
RNA Metabolism 

Ribonuddc add (RNA) is a linear single-stranded polymer of four nucleoUdes, ATP, CTP. 
UTP, and GTP. In most organisms, RNA is transcribed as a copy of DNA, the genetic material of the 
35 organism. In rdroviruses RNA rather than DNA serves as the goietic material. RNA copies of the 
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gaietic material ouxxle proteins or serve various structural, catalytic, ot regulatory roles in organisins. 
RNA is classified according to its cellular localization and functioa Messenger RNAs (mRNAs) 
encode polypqDtides. Ribosomal RNAs (rRNAs) are assCTibled, along with ribosomal proteins, into 
ribosomes, which are cytoplasmic particles that translate mRNA into polypqjtides. Transfer RNAs 
5 (tRNAs) are cylosolic adapts molecules that function in mRNA translation by recognizing both an 
mRNA codon and the amino acid that matches that codoa Heterogeneous nuclear RNAs (hnRN As) 
include mRNA precursors and other nuclear RNAs of various sizes. Small nuclear RNAs (snRNAs) 
are a part of the nuclear spliceosome complex that removes intervening, non-coding sequences (introns) 
and rejoins exons in pre-mRNAs. 

10 RNA Transcription 

The transcription process synthesizes an RNA copy of DNA Proteins involved include multi- 
subunit RNA polymerases, transcription faaws HA, UB. IID. IIE. IIF, IIH, and nj. Many 
transcription factors incorporate DNA-binding structural motifs which comprise either a-helices or 
sheets that bind to the major groove of DNA. Four well-characterized structural motifs are helix-turn- 

1 5 helix, zinc finger, leucine zipper, and helix-loop-helix. 
RNA Processing 

Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre-mRNA 
processintg steps include capping at the 5' end with methylguanosine, polyadenylating the 3* end, and 
splicing to ronove introns. The spliceosomal complex is comprised of five small nuclear 

20 ribonuclepprotdn particles (snRNPs) designated Ul, U2, U4, U5, and U6. Eadi snRNP contains a 
single species of snRN A and about ten proteins. The RNA components of some snRNPs recognize and 
base-pair with intron consensus sequences. The protein components mediate spliceosome assonbly and 
the splicing reactioa Autoantibodies to snRNP proteins are found in the blood of patirats with 
systemic lupus erythematosus (Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New 

25 YOTkNY,p. 863), 

Heterogeneous nuclear ribonudeoprotdns (hnRNPs) have been idratified that have roles in 
splicing, exporting of the mature RNAs to the cytoplasm, and mRNA translation (Biamonti, G. a al. 
(1998) Clin. Exp. Rhaimatol. 16:317-326). Some examples of hnRNPs inchide the yeast proteins 
Hiplp, involved in cleavage and polyadenylation at the 3* end of the RNA; CbpSOp. involved in 

30 capping the 5* end of the RNA; and Npl3p, a homolog of mammalian hnRNP Al, mvolved in export of 
mRNA from the nucleus (Shen, E.C. et al. (1998) Genes Dev. 12:679-691). HnRNPs have beoi shown 
to be inqxMtant targets of the autoimmune response in rheumatic diseases (Biamonti , supra) . 

Many snRNP protdns, hnRNP proteins, and alternative splicing faaors are characterized by 
an RNA recognition mofif (RRM). (Reviewed in Bimey, E. €t al. (1993) Nucleic Acids Res. 21 :5803- 

35 5816.) The RRM is about 80 amino adds in length and fonns four P-strands and two a-hdices 
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arranged in an a/p sandwich. The RRM contains a core RNP-1 0(tapq)tidenK}tif along with 
surrounding conserved sequences. 
RNA Stability and Degradation 

RNA helicases alter and regulate RNA conformation and secondary structure by using energy 
derived from ATP hydrolysis to destabilize and unwind RNA duplexes. The most wdl-characterized 
and ubiquitous family of RNA hdicases is the DEAD-box family, so named for the conserved B-type 
ATP-binding motif which is diagnostic of protdns in this family. Over 40 DEAD-box helicases have 
been identified in cvganisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. 
DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome 
assembly, and RNA editing, transport, and stability. Some D£AI>-box helicases play tissue- and stage- 
specific roles in spermatogenesis and embryogenesis. (Reviewed in Linder, P. et al. (1989) Nature 
337:121-122.) 

Overexpression of the DEAD-box 1 protein (DDXl) may play a role in the progression of 
neuroblastoma (Nb) and retinoblastoma (Rb) tumors. Other DEAD-box hdicases have been implicated 
dther directly ot indirectly in ultraviolet light-induced tumors, B cell lymphoma, and mydoid 
malignances. (Reviewed in Godbout, R. et al. (1998) J. Biol. Chem. 273:21 161-21 168.) 

Ribonudeases (RNases) catalyze the hydrolysis of phosphodiester bonds in RNA chains, thus 
cleaving the RNA. For example, RNase P is a ribonucleoprotdn sizyme which cleaves the 5' end of 
pre-tRNAs as part of their maturation process. RNase H digests the RNA strand of an RNA/DNA 
hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in 
the retroviral rqjlication cycle. RNase H domains are often found as a domain assodated with reverse 
transcriptases. RNase activity in serum and cdl ^tracts is devated in a variety of cancers and 
infectious diseases (Schein, C.H. (1997) Nat. Biotechnol. 15:529-536). Relation of RNase activity is 
being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and 
rq)licatton, and fungal infections. 
Protdn Translation 

The eulcaryotic ribosome is conqx)sed of a 60S 0arge) subunit and a 40S (small) subunit, 
which together form the SOS ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome 
also contains more than fifty proteins. The ribosomal proteins have a prefix wtiich denotes the subunit 
to which they bdong, dther L Oarge) or S (small). Three important sites are idratified on the ribosome. 
The aminoacyl-tRNA site (A site) is Miiere charged tRNAs (with the excq)tion of the initiator-tRNA) 
bmd on arrival at the ribosome. The peptidyl-tRNA site (P site) is whiere new peptide bonds are 
fcHmed, as wdl as where the initiator tRNA binds. The exit site (E site) is where deacylated tRNAs 
bind prior to thdr rdease from the ribosome. (Translation is reviewed in Stryer, L. (1995) 
Biochemistrv . W.H. Freeman and Conq>any. New York NY, pp. 875-908; and Lodish, H. et al. (1995) 
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Molecular Cell Biology . Sdenlific American Books. New York NY. pp. 1 19*138.) 
^NACharRins 

Protdn biosynthesis depends on each amino add fonning a linkage with the appropriate tRNA. 
The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino 
5 add with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two 
structural classes. Class I and Class 11. Autoantibodies against aminoacyl-tRNAs are generated by 
patients with dermatomyositis and polymyositis, and corrdate strongly with complicating interstitial 
lung disease (ILD). These antibodies appear to be generated in response to viral infection, and 
coxsackie virus has been used to induce experimental viral myositis in animals. 

10 Translation Initiation 

Initiation of translation can be divided into three stages. The first stage brings an initiator 
transfer RN A (Met-tRN A^.) together with the 40S ribosGmal subunit to farm the 43S prdnitiation 
complex. The second stage binds the 43S prdnitiation complex to the mRNA, followed by migration of 
the con^)lex to the correct AUG initiation codoa The third stage brings the 60S ribosomal subunit to 

15 the AOS subunit to generate an SOS ribosome at the initiation codoa Regulation of translation primarily 
involves the first and second stage in the initiation process (Pain. V.M. (1996) Eur. J. Biochean. 
236:747-771). 

Several initiation factors, many of which contain multiple subunits, are involved in bringing an 
initiates* tRNA and 40S ribosomal subiuiit together. eIF2. a guanine nucleotide binding protdn. recruits 

20 the initiator tRNA to the 40S ribosomal subunit. Only when eIF2 is bound to GTP does it associate 
with the initiator tRNA. eIF2B, a guanine nucleotide exchange protdn, is responsible for converting 
eIF2 from the GDP-bound inactive form to the GTP-bound active form Two other faaors. elFl A and 
eIF3 bind and stabilize the 40S subunit by interacting with 18S ribosomal RNA and specific ribosomal 
struaural protdns. eIF3 is also involved in assodation of the 40S ribosomal subunit with mRNA. The 

2 5 Met-tRNAf, elFl A. eIF3. and 40S ribosomal subunit together make up the 43S prdnitiation complex 
(Pain, supra) . 

Additional factors are required for binding of the 43S preinitiation complex to an mRNA 
molecule, and the process is regulated at several levds. eIF4F is a conq}lex consisting of three protdns: 
eIF4E. dF4A. and eIF4G. eIF4£ recognizes and binds to the mRNA 5 -tffminal m^GTP cap. eIF4A is 
30 a bidirectional RNA-dq)end^ hdicase^ and eIF4G is a scaffolding polypeptide. eIF4G has three 
binding domains. The N-terminal third of eIF4G interacts with eIF4E. the c^al third interacts with 
eIF4A. and the C-termina] third interacts with eIF3 bound to the 43S prdnitiation complex. Thus. 
eIF4G acts as a bridge becween the 40S ribosomal subunit and the mRNA (Hentze, M. W. (1 997) 
Sdence 275:500-501). 

35 The ability of dF4F to initiate binding of the 43S prdnitiation complex is r^ated by 
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Structural features of the mRN A. The mRN A molecule has an untranslated region (UTR) between the 
5' cap and the AUG start codon. In some mRNAs this region forms secondary structures that \mpe6& 
binding of the 43s'preinitiation complex. The hdicase activity of eIF4A is thought to function in 
removing this secondary structure to facilitate binding of the 43S preinitiation conq)lex (Pain» supra). 
5 Translation Elongation 

Elongation is the process whereby additional amino acids are joined to the initiator methionine 
to form the compile polypqptide chain. The dongation faaors EFla, EFl p y, and EF2 are involved 
in elongating the polypeptide chain following initiatioa EFl a is a GTP-binding protein. In EFl a's 
OTP-bound form, it brings an aminoacyl-tRNA to the ribosome's A site. The amino acid attached to 

10 the newly arrived ammoacyl-tRNA forms a pq)tide bond with the initiate methionine. The OTP on 
EFla is hydrolyzed to GDP. and EFla-GDP dissociates from the ribosome. EFlp y binds EFla -GDP 
and induces the dissociation of GDP from EFla, allowing EFla to bind GTP and a new cycle to begin. 

As subsequent aminoacyl-tRNAs are brought to the ribosome, EF-G, another GTP-binding 
protein, catalyzes the translocation of tRN As from the A site to the P site and finally to the E site of the 

15 ribosome. This allows the processivity of translation. 
Translation Termination 

The release f aaor eRF carries out termination of translation. eRF recognizes stop codons in 
the mRN A, leading to the release of the polypq)tide chain from the ribosome. 
Post-Translational Pathwavs 

20 Proteins may be modified after translation by the addition of phosphate, sugar, pr^yl, fatty 

add, and other chemical groups. These modifications are often requfred for juropa protein activity. 
Enzymes involved in post-translational modification include kinases, phosphatases, 
glycosylu-ansferases, and prenyltransferases. The conformation of proteins may also be modified after 
translation by the introduction and rearrangonent of disulfide bonds (rearrang^ent catalyzed by 

2 5 protein disulfide isomerase), the isomerization of proline sidediains by prolyl isomerase, and by 

interactions with molecular diaperone proteins. 

Proteins may also be cleaved by proteases. Such cleavage may result in activation, 
inactivation, or conq)lete degradation of the protein. Proteases include serine proteases, cysteine 
proteases, aspartic proteases, and metalloproteases. Signal pq)tidase in the endoplasmic reticulum 

3 0 (ER) lumen cleaves the signal pq>tide from membrane or secretory protdns that are imported into the 

ER. Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major 
pathway for the degradation of cellular proteins in eukaiyotic cells and some bact^a. The UCS 
mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory 
proteins that control cellular processes such as gene transcription and cell cycle progression. In the 
35 UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable 
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piotein. Proteins involved in the UCS include ubiquitin-activating ^yme, uWquitin-conjugating 
enzymes, ubiquitin-ligases. and ubiquitin C-tenmnal hydrolases. The ubiquitinated protein is then 
recognized and degraded by the proteasome, a large, multisubunit proteolytic enzyme complex, and 
ubiquitin is released for reutilization by ubiquitin protease. 
5 Lipid Metabolism 

Lipids are water-insoluble, oily or greasy substances that are soluble in noiq)olar solvents such 
as chloroform or ether. Neutral fats (triacylglycerols) serve as major ftids and enstgy stores. Polar 
lipids, such as phospholipids, sphingolipids» glycolipids, and cholesterol, are key structural components 
of cdl membranes. 

10 Lipid metabolism is involved in human diseases and disorders. In the arterial disease 

atherosclerosis, fatty lesions form on the inside of the arterial wall. These l^ions promote the loss of 
arterial flexibility and the formation of blood clots (Guyton. A.C. Textbook of Medical Phvsiologv 
(1991) W.B. Saunders Conq)any. Philadelphia PA, pp.760-763). In Tay-Sachs disease, the GMj 
ganglioside (a sphingolipid) accumulates in lysosomes of the central nervous system due to a lack of the 

15 enzyme N-acetylhexosaminidase. Patiaits suffer nervous system degaieration leading to early death 
(Fauci, A.S. et al. (1998) Harrison^s Principles of Internal Medicine McGraw-Hill, New York NY, p. 
2171). The Niemann-Pick diseases are caused by defects in lipid metabolism. Nieraann-Pick diseases 
types A and B are caused by accumulation of sphingomyelin (a sphingolipid) and other lipids in the 
cOTtral nervous systan due to a defect in the enzyme sphingomyelinase, leading to neurod^eneration 

2 0 and lung disease. Niemann-Pick disease type C results from a defect in diolesterol transport, leading to 

the accumulation of sphingomydin and cholesterol in lysosomes and a secondary reduction in 
sphingomyelinase activity. Neurological synq)toms such as grand mal seizures, ataxia, and loss of 
previously leamed^speech, manifest 1-2 years after birth. A mutation in the NPC protein, which 
contains a putative cholesterol-sensing domain, was found in a mouse model of Nionann-Pick disease 
25 type C (Fauci, sunra , p. 2175; Loftus, S.K. et al. (1997) Science 277:232-235). (Upid metabolism is 
reviewed in Stryer, L. (1995) Biochemistrv . W.H. Freonan and Con9)any, New York NY; Lehninger, 
A. (1982) Principles of Biochemistrv Worth Publishers, Inc., New York NY; and ExPASy 
"Biochemical Pathways" index of Boefaringer Mannheim World Wide Web site.) 
Fatty Add Svnthesis 

3 0 Fatty adds are long-^hain organic adds with a single carboxyl group and a long non-polar 

hydrocarbon tail. Long-chain fatty acids are essential components of glycolipids, phospholipids, and 
cholesterol, which are building blocks for biological membranes, and of triglycerides, whidi are 
biological fud molecules. Long-chain fatty adds are also substrates for dcosanoid production, and are 
inq)ortant in the functional modification of certain conq}lex carbohydrates and proteins. 1 6-carbon and 
35 18-carbon fatty adds are the most conunon. 
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Fatty acid synthesis occurs in the cytoplasm. In the first step, acetyl-Coenzyme A (CoA) 
carboxylase (ACQ synthesizes malonyl-CoA from acetyl-CoA and bicarbonate. The oizymes which 
catalyze the r^naining reactions are covalently linked into a single polypq)tide chain, referred to as the 
multifunctional oizyme fatty acid synthase (FAS). FAS catalyzes the syntliesis of palmitate from 
acetyl-CoA and malonyl-Co A FAS contains acetyl transferase, malonyl transferase, P-ketoacetyl 
synthase, acyl carrier protein, P-ketoacyl reductase, dehydratase, enoyl reductase, and thioesterase 
activities. The final produa of the FAS reaction is the 1 6-carbon fatty acid palmitate. Further 
elongation, as wdl as unsaturation, of palmitate by accessory enzymes of the £R produces the variety 
of long chain fatty acids required by the individual cell. These enzymes include a NADH-cytochrome 
bs reductase, cytochrome bj, and a desaturase. 
Phospholipid and Triacvlglvcerol Synthesis 

TriacylglycCTols, also known as triglycerictes and noitral fats, are major energy stores in 
animals. Triacylglycerols are esters of glycerol with three fatty acid chains. Glycerol-3-phosphate is 
produced from dihydroxyacetone phosphate by the enzyme glycerol phosphate dehydrogenase or from 
glyca-ol by glycerol kinase. Fatty acid-CoA's are produced from fatty adds by fatty acyl-CoA 
synthetases. Glyercol-3-phosphate is acylated with two fatty acyl-CoA*s by the enzyme glycerol 
phosphate acyltransferase to give phosphatidate. Phosphatidate phosphatase converts phosphatidate to 
diacylglycerol, which is subsequently acylated to a triacylglyercol by the enzyme diglyceride 
acyltransferase. Phosphatidate phosphatase and diglyceride acyltransferase form a triacylglyerol 
synthetase complex bound to the ER membrane. 

A major class of phospholipids are the phosphoglycerides, ^ch are conqxKed of a glycerol 
backbone^ two fatty acid chains, and a phosphorylated alcohol. Phosphoglycerides are components of 
cell monbranes. Principal phosphoglycerides are phosphatidyl choline, phosphatidyl ethanolamine, 
phosphatidyl serine, phosphatidyl inositol, and diphosphatidyl glycerol. Many ^ymes involved in 
phosphoglyceride synthesis are associated with membranes (Meyers, R.A. (1995) Molecular Biology 
and Biotechnology . VCH Publishers Inc., New York NY, pp. 494-501). Phosphatidate is converted to 
CDP-diacylglycerol by the enzyme phosphatidate cytidylyltransferase (ExPASy ENZYME EC 
2.7.7.41). Transfer of the diacylglycerol group from CDP-diacylglycerol to serine to yield phosphatidyl 
serine, or to inositol to yield phosphaticfyl inositol, is catalyzed by the enzymes CDP-diacylglycerol- 
serine O-phosphatidyltransferase and CDP-diacylglycerol-inositol 3-phosphatidyltransferase, 
respectivdy (ExPASy ENZYME EC 2.7.8.8; ExPASy ENZYME EC 2.7.8. 1 1). The enzyme 
phosphatidyl serine decarboxylase catalyzes the conversion of phosphatidyl serine to phosphatidyl 
ethanolamine. using a pyruvate cofactor (Vodker, D.R. (1997) Biochim. Biophys. Acta 1348:236-244). 
Phosphatidyl choline is formed using diet-derived choline by the reaction of CDP-choline with 1,2- 
diacylglycerol, catalyzed by diacylglycerol cholinephosphou-ansferase (ExPASy ENZYME 2.7.8.2), 
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Sterol. Steroid, and Isoorenoid Metabolism 

Cliblesterol» conqxised of four fused hydrocarbon rings with an alcotiQl at one end, moderates 
the fluidity of membranes in \Mdti it is incxsrporated. In acUition. cholesterol is used in the synthesis of 
steroid hormones such as Cortisol, progesterone^ estrogen, and testosterone. Bile salts derived from 
5 cholesterol facilitate the digestion of lipids. Cholesterol in the skin forms a barrier that prevents excess 
water evaporation from the body. Famesyl and geranylgeranyl groups, which are derived from 
cholesterol biosynthesis intermediates^ are post-translationally added to signal transduction proteins 
sudi as ras and protdn-targeting protdns sach as rab. These modifications are important for the 
activities of these proteins (Guyton. supra : Stryer, supra , pp. 279-280, 691-702, 934). 

10 Mammals obtam cholesterol derived from both denovo biosynthesis and the diet. The liver is 

the major site of cholesterol biosynthesis in mammals. Two ac^yl-CoA molecules initially condense to 
form acetoacetyl-CoA, catalyzed by a thiolase. Acetoacetyl-CoA condeases with a third acetyl-CoA to 
form hydroxymethylglutaryl-CoA (HMG-CoA), catalyzed by HMG-CoA synthase. Conversion of 
HMG-CoA to cholesterol is accomplished via a series of enzymatic stq)s known as the mevjdonate 

1 5 pathway. The rate-limiting step is the conversion of HMG-CoA to mevalonate by HMG-CoA 
reductase. The drug lovastatin, a potent inhibitor of HMG-CoA reductase, is given to patients to 
reduce thdr serum cholesterol levels. Otha- mevalonate pathway enzymes include mevalonate kinase, 
phosphomevalonate kinase, diphosphomevalonate decarboxylase, isopentenyldiphosphate isomerase. 

-.1; 

dimethylallyl transferase, geranyl transferase, famesyl-diphosphatc famesyltransferase, squalene 
20 monooxygenase, lanosterol synthase, lathosterol oxidase, and 7-dehydrocholestfirol reductase. 

Cholesterol is used in the synthesis of steroid hormones such as Cortisol, progesterone, 
aldosterone, estrogen, and testosterone. First, didesterol is converted to pregnenolone by cholesterol 
monooxygoiases. The other steroid hormones are synthesized from pregnenolone by a series of 
enzym&catalyzed reactions including oxidations, isomerizations, hydroxylations, reductions, and 

2 5 dem^ylations. Exanq)les of these oizymes include steroid A-isomerase, 3P-hydroxy-A^-steroid 

dehydrogenase, steroid 21-monooxygenase, steroid 19-hydroxylase, and SP-hydroxystercud 
dehydrogenase. Cholesterol is also the precursor to vitamin D. 

Numerous compounds contain S-carbon isopfene units derived from the mevalonate pathway 
intermediate isqpentenyl pyrophosphate. Isoprendd groups are found in vitamin K, ubiquinone, retinal, 

3 0 dolichol phosphate (a carrier of oligosaccharides needed for N-linked glycosylation), and famesyl and 

geranylgeranyl groups that modify protdns. Enzymes involved include farnesyl transferase, polyprenyl 
transferases, dolichyl phosphatase, and dolichyl kinase. 
Sohingolipid Metabolism 

Sphingolipids are an inqxmnt class of monbrane lipids that contain sphingosine, a long diain 
3 5 amino alcohol. They are conqxised of one long-chain fatty acid, one polar head alcdiol, and 
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spfaingosine or sphingosine derivative. The three classes of sphingolipids are sphingomyelins, 

cerebrosides, and gangliosides. Sphingomydins, which contain phosptiocholine or 

phosphoethanolamine as tliear head group, are abundant in the myelin sheath surrounding nerve cells. 

Galactoceretxrosides, which contain a glucose or galactose head group, are characteristic of the brain. 
5 Other cerd>rosides are found in nonnaural tissues. Gangliosides, whose head groups contain multiple 

sugar units, are abundant m the brain, but are also found in nonnwal tissues. 

SphiQgdipids are built on a spMngosine backbone. Sphingosine is acylated to ceramide by the 

enzyme sphingosineacetyltransferase. Ceramide and phosphatidyl dioline are converted to 

sphingomyelin by the enzyme ceramide choline phosphotransferase. Cerebrosides are synthesized by 
10 the linkage of glucose or galactose to ceramide by a transferase. Sequential addition of sugar residues 

to ceramide by transferase enzymes yields gangliosides. 

Eicosanoid Metabolism 

Eicosanoids, including prostaglandins, prostacyclin, thromboxanes, and leukotrienes, are 20- 

carbon molecules derived from fatty acids. Eicosanoids are signaling molecules which have roles in 
1 5 pain, fever, and inflammation. The precursor of all eicosanoids is arachidonate, which is generated 

from plK)spholipids by phospholipase and from diacylglycCTOls by diacylglycerol lipase. 

Leukotrienes are produced from arachidonate by the action of lipoxygenases. Prostaglandin synthase, 

reductases, and isomerases are responsible for the synthesis of the prostaglandins. Prostaglandins have 

roles in inflammation, blood flow, ion transport, synaptic transmission, and s]eep. I*rostacyclin and the 
2 0 thromboxanes are derived from a precursor prostaglandin by the action of prostacyclin synthase and 

thromboxane synthases, respectively. 

Ketone Bodv Metabolism 

Pairs of acetyl-CoA molecules derived from fatty add oxidation in the Uvct can condense to 

fonn acrtoacetyl-CoA, which subsequently forms acetoacetate, D-3-hydroxybutyrate, and acetone. 

2 5 These three products are known as ketone bodies. Enzymes involved in k^ne body metabolism 

include HMG-CoA synthetase, HMG-CoA cleavage oizyme, E>-3-hydroxybutyrate ddiydrogenase, 
acetoacetate decarboxylase, and 3-ketoacyl-CoA transferase. Ketone bodies are a normal fuel supply 
of the heart and renal cortex. Acetoacetate produced by the liver is transported to ceils where the 
acetoacetate is converted back to acetyl-CoA and enters the citric acid cycle. In times of starvation, 

3 0 ketone bodies produced from stored triacylglyerols become an important fud source, especially for the 

brain. Abnormally high levds of ketone bodies are observed in diabetics. Diabetic coma can result if 
ketone body levels become too great. 
Lipid Mobilization 

Within cdls, fatty adds are transported by cytoplasmic fatty acid binding protdns (Online 
35 Menddian Inheritance hi Man (OMIM) '^134650 Fatty Add-Binding Protdn 1, Liver; FABPl). 
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Diazq)ain binding inbibitcx' (DBI), also known as en(iazq>ine and acyl CoA-binding protdn. is an 
otogenous y-aminobutyric add (GAB A) receptor ligand whicb is thought to down-relate the effects 
of GABA. DBI binds medium- and long-chain acyl-CoA esters with very high affinity and may 
Junction as an intracdlular carrier of acyl-CoA esters (OMIM *1 25950 Diazepam Binding InhibitCH-; 
5 DBI; PROSITE PDOC00686 Acyl-CoA-bindiQg protein signature). 

Fat stored in liver and adqpose triglycerides may be released by hydrolysis and transported in 
the blood Free fatty acids are transported in the blood by albumin. Triacylglycerols and cholesterol 
esters in the blood are transported in lipoprotein particles. The particles consist of a core of 
hydrophobic lipids surrounded by a shell of polar lipids and apolipoproteins. The protein components 
1 0 serve in the solubilization of hydrophobic lipids and also contain cell-targeting signals. Lipoproteins 
include chylomicrons, chylcmiiaon remnants, very-low-density lipoproteins (VLDL), intermediate- 
density lipq)rotdns (IDL). low-density lipoproteins (LDL), and high-dcssity lipoproteins (HDL), 
There is a strong inverse correlation between the levds of plasma HDL and risk of premature coronary 
heart disease. 

1 5 Triacylglycerols in chylomicrons and VLDL are hydrolyzed by lipcprotdn lipases that line 

blood vessels in muscle and other tissues that use fatty adds. Cell surface LDL receptors bind LDL 
particles which are then internalized by endocytosls. Absence of the LDL recq)lor, the cause of the 
disease familial hypercholesterolemia, leads to increased plasma cholesterol levels and ultimatdy to 
athffosderosis. Plasma cholesteryl ester transfer protein mediates the transfer of cholesteryl esters 

20 from HDL to apolipoprotein B-containing lipoprotdns. Cholesteryl ester transfer protdn is important 
in the revo-se cholesterol transport system and may play a role in atherosclerosis (Yamashita, S. al. 
(1997) Curr. Opia Lipidol. 8:101-110). Macrophage scavenger receptors, which bind and internalize 
modified lipoproteins, play a role in lipid transport and may contribute to atherosclerosis (Greaves, 
D.R. et al. (1998) Curr. 0pm. Upidol. 9:425-432). 

2 5 Protdns involved in cholesterol uptake and biosynthesis are tightly regulated in response to 

cdlular cholesterol levds. The sterol regulatory elem^ binding protein (SREBP) is a sterol-responsive 
transcription factor. Under normal cholesterol conditions, SREBP resides in the ER membrane. When 
cholesterol levels are low, a regulated cleavage of SREBP occurs which rdeases the extracdlular 
domain of the protdn. This cleaved domain is then transported to the nucleus where it activates the 

3 0 transcription of the LDL recq)tCH' gene, and genes encoding enzymes of cholesterol synthesis, by 

binding the sterol regulatory dement (SRE) upstream of the genes (Yang, J. et al. (1995) J. Biol. Chem. 
270:12152-12161). Regulation of cholesterol uptake and biosynthesis also occurs via the oxysterol- 
binding protdn (OSBP). OSBP is a high-affinity intracdlular recq)tor for a vari^y of oxysterols that 
down-regulate cholesterol synthesis and stimulate cholesterol esterification (Lagace, T.A. et al. (1997) 
35 Biodiem. J. 326:205-213). 

31 



wo 00/73509 



PCTAJSOO/15404 



B^a-oxidation 

Mitochondrial and peroxisomal b^-oxidation enzymes degrade saturated and unsaturated fatty 
adds by sequential removal of two-carbon units from CoA-activated fatty adds. The main beta- 
oxidation pathway d^ades both saturated and unsaturated fatty adds while the auxiliary pathway 
5 performs additional steps required for the degradation of unsaturated fatty acids. 

The pathways of mitodiondrial and peroxisomal beta-oxidation use similar enzymes, but liave 
different substrate spedficities and functions. Mitochondria oxidize sh^t-, medium-, and long-chain 
fatty acids to produce energy fm* cdls. Mitochondrial b^-oxidation is a major energy source fCH* 
cardiac and skdetal muscle. In liver, it provides ketone bodies to the peripheral circulation when 

10 glucose levds are low as in starvation, endurance exerdse, and diabetes (Eaton, S. et al. (1996) 
Biochon. J. 320:345-357). Peroxisomes oxidize medium-, long-, and very-long-chain fatty adds, 
dicarboxylic fatty acids, branched fatty adds, prostaglandins, xenobiotics, and bile acid intermediates, 
The chief roles of peroxisomal beta-oxidation are to shortai toxic lipophilic carboxylic adds to 
facilitate thdr excretion and to shorten va^-long-chain fatty adds prior to mitodiondrial b^a-oxidation 

15 (Mannaerts. G.P. and P.P. van Vddhoven (1993) Biochimie 75:147-158). 

Enzymes involved in beta-oxidation include acyl CoA synthetase, carnitine acyltransferase, 
acyl CoA dehydrogenases, enoyl CoA hydratases, L-3-hydroxyacyl CoA dehydrogaiase, P-ketothiolase, 
2,4-di^oyl CoA reductase, and isomerase. 
Lipid Cleavage and Degradation 

2 0 Triglycerides are hydrolyzed to fatty adds and glycerol by lipases. Lysophospholipases 

(LPLs) are widdy distributed enzymes that metabolize intracdlular lipids, and occur in numerous 
isoforms. Small isoforms, approximatdy 15-30 kD, function as hydrolases; large isoforms, those 
exceeding 60 kD, function both as hydrolases and b-ansacylases. A particular substrate for LPLs, 
lysq)hosphatidylcholine, causes lysis of cdl membranes when it is formed or inqxjrted into a cell. 

2 5 LPLs are regulated by lipid factors induding acylcamitine, arachidonic acid, and phosphatidic add. 

These lipid factors are signaling molecules important in numercnis pathways, including the 
inflammatory response. (Anderson, R. et al. (1994) Toxicol, ^pl. Pharmacol. 125:176-183; Sdle, H. 
^ al. (1993); Eur. h Biochem. 212:41 M16.) 

The secretcvy phospholipase A2 (PLA2) superfamily comprises a number of h^og^ieous 

3 0 enzymes whose common feature is to hydrolyze the sn-2 fatty acid acyl ester bond of 

phosphoglycerides. Hydrolysis of the glycerqphospholipids rdeases free fatty adds and 
lysophospholipids. PLA2 activity generates precursors for the biosynthesis of biologically active lipids, 
hydroxy fatty adds, and platdet-activating factor. PLA2 hydrolysis of the sn-2 ester bond in 
phospholipids generates free fatty adds, such as arachidonic acid and lysophospholipids. 
35 Carbon and Carbohydrate Metabolism 
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Carbdiydrates, including sugars or saccharides, starch, and cellulose* are aldehyde or ketone 
coii;)Ounds with multiple hydroxyl groups. The inqxalance of carbohydrate metabolism is 
demonstrated by the soositive regulatory system in place for maintenance of blood glucose levels. Two 
panaeatic hormones, insulin and glucagon, promote increased glucose uptake and storage by cells, and 
5 increased glucose rdease from cells, respectively. Carbohydrates have three important roles in 
mammalian cdls. First, carbohydrates are used as energy stores, fuels, and metabolic intermediates. 
Carbohydrates are broken down to form energy in glycolysis and are stored as glycogen for later use. 
Second, the sugars deoxyribose and ribose form part of the structural support of DN A and RN A, 
respectively. Third, carbohydrate modifications are added to seaeted and membrane proteins and lipids 

10 as th^ traverse the seaetory pathway. Cdl surface carbdiydrate-containing macromolecules, 

including glycoproteins, glycolipids, and transmembrane proteoglycans, mediate adhesion with other 
cdls and with components of the extracdlular matrix. The extracellular matrix is comprised of div«^se 
glycoprotdns, glycosaminoglycans (GAGs), and carbohydrate-binding protdns which are secreted from 
the cdl and assembled into an organized meshwork in close association with the cdl surface. The 

1 5 interaction of the cell with the surrounding matrix profoundly influences cell shape, strength, flexibility, 
motility, and adhesion. These dynamic prq)erties are intimately associated with signal transduction 
pathways controlling cdl proliferation and differentiation, tissue construction, and embryonic 
development. 

Carbohydrate metabolism is altered in several disorders including diabetes mdlitus, 
20 hyperglycemia, hypoglycemia, galactosemia, galactoWnase defidency, and lJDP-galactose-4-epimerase 
defidoicy (Faud, A.S. et al. (1998) Harrison's Prindples of Intanal Medicine . McGraw-Hill, New 
York NY, pp. 2208-2209). Altered carbohydrate metabolism is assodated with cancer. Reduced GAG 
and proteoglycan expression is assodated with human lung cardnomas (Nackaerts, K. et al. (1997) Int. 
J. Cancer 74:335-345). The carbohydrate determinants sialyl Lewis A and sialyl Lewis X are 
25 frequently expressed on human cancer cdls (Kannagi, R. (1997) Glycoconj. J. 14:577-584), 

Alterations of the N-linked carbdiydrate core structure of cdl surface glycoprotdns are linked to colon 
and panaeatic cancers (Schwarz, R.E. et al. (1996) Cancer Lett. 107:285-291). Reduced repression 
of the Sda blood group carbohydrate structure in cdl surface glycolipids and glycoprotdns is observed 
in gastrointestinal cancer (Dohi, T. et al. (1996) Int. J. Cancer 67:626-663). (Carbon and 
30 carbohydrate metabolism is reviewed in Stryer, L. (1995) Biochemistry W.H. Freonan and Conq}any, 
New York NY; Lehninger, A.L. (1982) Prindples of BiochCTUStrv Worth Publishers Inc., New York 
NY; and Lodish, H. et al. (1995) Molecular Cdl Biology Scientific American Books, New York NY.) 
Glycolysis 

Enzymes of the glycolytic pathway convert the sugar glucose to pyruvate while simultaneously 
3 5 producing ATP. The pathway also provides building blocks for the synthesis of cdlular components 
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such as long-chain fatty acids. After glycolysis, pyrvuate is converted to acetyl-Coenzyme A, which, in 
aerobic organisms, enters the citric acid cycle. Glycolytic enzymes include hexokinase, phosphoglucose 
isomerase, phosphoiructokinase, aldolase, triose phosphate isomerase, glyceralddiyde 3-phosphate 
dehydrogoiase, phosphoglycaate kinase, phosphoglyceromutase, enolase, and pyruvate kinase. Of 
these, phosphofructokinase, hexokinase, and pyruvate kinase are important in regulating the rate of 
glycolysis. 
Gluconeogenesis 

Gluconeogenesis is the synthesis of glucose from noncarbohydrate precursors such as lactate 
and amino adds. The pathway, which functions mainly in times of starvation and intense exCTCise, 
occurs mostly in the liver and kidney. Responsible enzymes include pyruvate carboxylase, 
phosphoenolpynivate carboxykinase, fructose 1,6-bisphosphatase, and glucose-6-phosphatase. 
Pentose Phosphate Pathwav 

Pentose phosphate pathway enzymes are responsible for generating the reducing agent 
NADPH, while at the same time oxidizing glucose-6-phosphate to ribose-5 -phosphate. Ribose-5- 
phosphate and its derivatives become part of important biological molecules such as ATP, Coenzyme 
A, NAD*, FAD, RN A, and DNA. The pentose phosphate pathway has both oxidative and non- 
oxidative branches. The oxidative branch stq)s, which are catalyzed by the enzymes glucose-6- 
phosphate dehydrogenase^ laaonase, and 6-phosphogluconate dehydrogenase, convert glucos6-6- 
phosphate and NADP* to ribulose-6-phosphate and NADPH. The non-oxidative branch stq)s, which 
are catalyzed by the oizymes phosphopentose isomerase, phosphopentose epimerase, transk^olase, and 
transaldolase, allow the interconversion of three-, four-, five-, six-, and sevra-carbon sugars. 
Glucouronate Metabolism 

Glucuronate is a monosaccharide which, in the form of D-glucuronic add, is found in the 
GAGs chondrdtin'and denmtan. D-glucuronic add is also in^xntant in the detoxification and 
excretion of fordgn organic compounds such as phenol. Enzymes involved in glucuronate m^abolism 
include UDP-glucose dehydrogenase and glucuronate reductase. 
Disaccharide Metabolism 

Disaccfaarides must be hydrolyzed to monosaccharides to be digested. Lactose, a disacdiaride 
fomd in milk, is hydrolyzed to galactose and gflucose by the enzyme lactase. Maltose is derived from 
plant starch and is hydrolyzed to glucose by the enzyme maltase. Sucrose is derived from plants and is 
hydrolyzed to glucose and fructose by the enzyme sucrase. Trehalose, a disaccharide found mainly in 
insects and mushrooms, is hydrolyzed to glucose by the mcyme trehalase (OMIM *275360 Trehalase; 
Ruf. J. et al. (1990) J. Biol. Chem. 265:15034-15039). Lactase, maltase, sucrase, and trdbalase are 
bound to mucosal cdls lining the small intestine, where they partidpate in the digestion of dietary 
disacdiarides. The enzyme lactose synth^e, composed of the catalytic subunit galactosyltransferase 
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and the modifier subunit a-lactalbumin, converts UDP-galactose and glucose to lactose in the mammary 
glands. 

Glycogen. Starch, and Chitin Metabolism 

Glycogen is the storage form of carbohydrates in mammals. Mobilization of gflycogen 
5 maintains glucose levels between meals and during muscular activity. Glycogen is stored mainly in the 
liver and in skdetal muscle in the fonn of cytoplasmic granules. These granules contain enzymes that 
catalyze the synthesis and degradation of glycogen, as well as enzymes that regulate these processes. 
Enzymes that catalyze the degradation of glycogen include glycogen phosphorylase, a transferase* a- 
1 ,6'glucosidase, and phosphoglucomutase. Enzymes that catalyze the synthesis of glycogen include 

10 UDP-glucose pyrophosphorylase, glycogen synthetase* a branching enzyme, and nucleoside 

diphosphokinase. The enzymes of glycogen synthesis and degradation are tightly regulated by the 
hormones insulin, glucagon, and CT)inw?hrine, Starch a plant-derived pGlysaeeharidc, is hydrolyzed to 
maltose, maltotriose, and a-dextrin by a-amylase, an enzyme secr^ed by the salivary glands and 
panaeas. Chitin is a polysaccharide found in insects and Crustacea. A chitotriosidase is secreted by 

1 5 macrophages and may play a role in the degradation of chitin-containing pathogens (Boot, R.G. et al. 
(1995) J. Biol. Chem. 270:26252-26256). 
Peptidoelvcans and Glvcosaminoglvcans 

Glycosaminoglycans (GAGs) are anionic linear unbranched polysaccharides composed of 
rq)etitive disaccharide units. These repetitive units contain a derivative of an amino sugar, either 

2 0 glucosamine or galactosamine. GAGs exist free or as part of proteoglycans, large molecules composed 

of a core protdn attached to one or more GAGs. GAGs are found on the cell surface, inside cells, and 
in the extracellular matrix. Changes in GAG levels are associated with several autoimmune diseases 
including autoimmune thyroid disease, autoimmune diabetes mellitus, and systemic lupus 
erythematosus (Hansen, C. et al. (1996) Clin. Exp. Rhamfi. 14 (Suppl. 15):S59-S67). GAGs include 
25 chondrdtin sulfate, kaatan sulfate, hq)arin, heparan sulfate, dermatan sulfate, and hyaluronan. 

The GAG hyaluronan (HA) is found in the ^tracellular matrix of many cells, especially in soft 
connective tissues, and is abundant in synovial fluid (Pitsillides, A.A. et al. (1993) Int J. Exp. Pathol. 
74:27-34). HA seems to play inq)Qrtant roles in cdl relation, development, and differentiation 
(Laurent, T.C. and J.R. Frascr (1992) FASEB J. 6:2397-2404). Hyaluronidase is an enzyme that 

3 0 d^ades HA to oligosaccharides. Hyaluronidases may function in cdl adhesion, infection, 

angiogenesis, signal transduction, rq)roduction, cancer, and inflammation. 

Proteoglycans, also known as pq)tidoglycans, are found in the extracellular matrix of 
connective tissues such as cartilage and are essential for distributing the load in wdght-bearing joints. 
Cdl-surface-attached proteoglycans anchor cdls to the extracdlular matrix. Both extracdlular and 
3 5 cdl-surface proteoglycans bind growth factors, facilitating thdr binding to cdl-surface recq)tQrs and 
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subsequent triggering of signal transduction pathways. 
Amino Acid and Nitrogen Metabolism 

NH/ is assimilated into amino adds by the actions of two enzymes, glutamate 
dehydrogenase and glutamine synthetase. The carbon slceletons of amino adds come from the 
intermediates of glycolysis, the pentose phosphate pathway, or the dtric acid cycle. Of the twenty 
amino adds used in proteins, humans can synthesize only thirteen (nonessential amino acids). The 
remaining nine must come from the diet (essential amino acids). Enzymes involved in nonessential 
amino acid biosynthesis include glutamate kinase dehydrogenase, pyrroline carboxylate reductase, 
asparagine synthetase, phenylalanine oxygenase, methionine adenosyltransferase, 
adenosylhomocysteinase, cystathionine P-synthase, cystathionine ylyase, phosphoglyc^ate 
dehydrogenase, phosphoserine transaminase, phosphoserine phosphatase, serine 
hydroxylmethyltransferase, and glycine synthase. 

Metabolism of amino adds takes place almost entirely in the liver, where the amino group is 
removed by aminotransferases (transaminases), for example, alanine aminotransferase. The amino 
group is transferred to a-k^oglutarate to form glutamate. Glutamate dehydrogenase converts 
glutamate to NH/ and a-ketoglutarate. NH/isconvertedtoureaby the urea cycle which is 
catalyzed by the enzymes arginase, ornithine transcarbamoylase, arginosuccinate synthetase, and 
arginosucdnase. Carbamoyl phosphate synthetase is also involved in urea formation. Enzymes 
involved in the metabolism of the carbon skeleton of amino acids indude serine dehydratase, 
asparaginase, glutaminase, propionyl CoA carboxylase, methylmalonyl CoA mutase, branched-chain 
a-keto dehydrogenase complex, isovaleryl CoA dehydrogenase, P-methylorotonyl CoA caiboxylase, 
phenylalanine hydroxylase, p-hydroxylphenylpyruvate hydroxylase, and homogentisate oxidase. 

Polyamines, which include spermidine, putrescine, and sp^mine, bind tightly to nucldc adds 
and are abundant in rapidly proliferating cells. Enzymes involved in polyamine synthesis include 
ornithine decarboxylase. 

Diseases involved in amino acid and nitrogen metabolism include hyperammonemia, 
carbamoyl phosphate synthetase deficiency, urea cycle enzyme deficiencies, methylmalonic aciduria, 
maple syrup disease, alcaptonuria, and phenylketonuria. 
Energy Metabolism 

Cells derive energy from metabolism of ingested compounds that may be roughly categorized 
as carbohydrates, fats, or proteins. Energy is also stored in polymers such as triglycerides (fats) and 
glycogen (carbohydrates). Metabolism proceeds along separate reaction pathways connected by key 
intermediates such as acetyl coenzyme A (acetyl-CoA). Metabolic pathways feature anaerobic and 
aerobic degradation, coupled with the energy-requiring reactions such as phosphorylation of 
adenosine diphosphate (ADP) to the triphosphate (ATP) or analogous phosphorylations of guanosine 
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(GDP/GTP), uridine (UDP/UTP), or cytidine (CDP/CTP). Subsequent dephosphorylation of the 
triphosphate drives reactions needed for cell maintenance^ growth, and proliferation. 

E)igestive enzymes convert carbohydrates and sugars to glucose; fructose and galactose are 

converted in the liver to glucose. Enzymes involved in these conversions include galactose- 1- 

'f 

5 phosphate uridyl transferase and UDP-galactose-4 epimerase. In the cytoplasm, glycolysis converts 
glucose to pyruvate in a sales of reactions coupled to ATP synthesis. 

Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via 
the citric add cycle, involvmg pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and 
dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, 

10 aconitases, isodtrate d^ydrogenase, alpha-ketoglutarate dehydrogenase complex including 
transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, lumarases, and malate 
ddiydrogenase. Acetyl CoA is oxidized to COj with concomitant formation of NADH, FADHj, and 
GTP. In oxidative phosphorylation, the transport of electrons from NADH and FADHj to oxygen by 
ddiydrogenases is coupled to the synthesis of ATP from ADP and Pj by the FoF, ATPase complex in 

15 the mitochondrial inner m^brane. Enzyme complexes responsible for electron transpoit and ATP 
synthesis include the FoF, ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone 
reductase, cytochrome b, cytochrome c,, FeS protdn, and cytochrome c oxidase. 

Triglycerides are hydrolyzed to fatly acids and glycerol by lipases. Glycerol is then 
phosphorylated to glycerol-3-phosidiate by glycerol kinase and glycol phosphate dehydrogenase, 

2 0 and degraded by the glycolysis. Fatty adds are transported into the mitodiondria as fatty acyl- 
camitine esters and undergo oxidative degradation. 

In addition to metabolic disorders such as diabetes and obesity, disorders of energy 
metabolism are associated with cancers (Dorward, A. et al. (1997) J. Bioenerg. Biomembr. 29:385- 
392), autism (Lombard. J. (1998) Med. Hypotheses 50:497-500). neurodegenerative disorders (Alexi, 

25 T, et al. (1998) Neurorepoit 9:R57-64), and neuromuscular disorders (DiMauro, S. et al. (1998) 
Biochira. Biophys. Acta 1366:199-210). The myocardium is heavily dependent on oxidative 
metabolism, so metabolic dysfunction often leads to heart disease (DiMauro, S. and M, Hirano (1998) 
Curr. Opin. Cardiol. 13:190-197). 

For a review of energy metabolism enzymes and intermediates, see Stryer, L. et al. (1995) 

30 Biochemistry . W.H. Freeman and Co., San Francisco CA, pp. 443-652. For a review of energy 
metabolism regulation, see Lodish, H. et al. (1995) Molecular Cell Biology . Scientific American 
Books. New York NY, pp. 744-770. 
Cofactor Metabolism 

Cofactors, including coenzymes and prosthetic groups, are small molecular weight inorganic 
35 or organic compounds that are required for the action of an enzyme. Many cofaaors contain vitamins 
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as a component Cofactors include thiamine pyrophosphate, flavin adenine dinucleotide, flavin 
mononucleotide, nicotinamide adenine dinucleotide, pyridoxal phosphate, coenzyme A, 
tetrabydrofolate, lipoamide, and heme. The vitamins biotin and cobalamin are associated with 
enzymes as well. Heme, a prosthetic group found in myoglobin and hemoglobin, consists of 
protoporphyrin group bound to iron. Porphyrin groups contain four substituted pyrroles covalently 
joined in a ring, often with a bound metal atom. Enzymes involved in porphyrin synthesis include 6- 
aminolevulinate synthase, 5-aminolevulinate dehydrase, porphobilinogen deaminase, and cosynthase. 
Deficiencies in heme formation cause porphyrias. Heme is broken down as a part of erythrocyte 
turnover. Enzymes involved in heme degradation include heme oxygenase and biliverdin reductase. 

Iron is a required cofactor for many enzymes. Besides the heme-containing enzymes, iron is 
found in iron-sulfur clusters in proteins including aconitase, succinate dehydrogenase, and NADH-Q 
reductase. Iron is transported in the blood by the protein transferrin. Binding of transfenin to the 
transferrin receptor on cell surfaces allows uptake by receptor mediated endocytosis. Cytosolic iron is 
bound to ferritin protein. 

A molybdenum-containing cofaaor (molybdopterin) is found in enzymes including sulfite 
oxidase, xanthine dehydrogenase, and aldehyde oxidase. Molybdopterin biosynthesis is performed by 
two molybdenum cofactor synthesizing enzymes. Deficiencies in these enzymes cause mental 
retardation and lens dislocation. Other diseases caused by defects in cofactor metabolism include 
pernicious anemia and methylmalonic aciduria. 
Secretion and Trafficking 

Eukaryotic cdls are bound by a lipid bilayer membrane and subdivided into functionally 
distinct, membrane bound compartm^ts. The m^branes maintain the essential differences between 
the cytosol, the extracellular environment, and the lumenal space of each intracellular organelle. As 
lipid membranes are highly impermeable to most polar molecules, transport of essoitial nutrients, 
metabolic waste products, cell signaling molecules, macromolecules and proteins across lipid 
membranes and between organelles must be mediated by a vari^y of transport-associated molecules. 
Protdn Trafficking 

In aikaryotes, someprotdns are synthesized on ER-bound ribosomes, co-translationally 
imported into the ER. delivered from the ER to the Golgi complex for post-translational processing and 
sorting, and transported from the Golgi to specific intracellular and extracellular destinations. All cells 
possess a constitutive transport process which maintains homeostasis between the cell and its 
environment. In many diffffentialed cell types, the basic machinery is modified to carry out specific 
transport functions. For example, in endoaine glands, hormones and otho- seaeted proteins are 
packaged into seaetory granules for regulated exocytosis to the cell exterior. In maaophage, foreign 
extracdlular material is engulfed (phagocytosis) and delivered to lysosomes for degradation. In fat and 
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muscle cdls, glucose transporters are stored in vesicles wtiich fuse with ttie plasma membrane only in 
response to insulin sUmulatioa 
The Secretory Pathway 

Synthesis of roost integral membrane proteins, secreted proteins, and proteins destined for the 
5 lumen of a particular organelle occurs on £R-bound ribosomes. These proteins are co-translationally 
imported into the ER. The protons leave the ER via monbrane-bound vesicles which bud off the ER at 
spediic sites and fuse with each other (homotypic fusion) to form the ER-Golgi Intermediate 
Con[5)artment (ERGIC). The ERGIC matures progressively through the cis, medial, and trans dstemal 
stacks of the Golgi, modifying the enzyme composition by retrograde transport of specific Golgi 

10 enzymes. In this way, proteins moving through the Golgi undergo post-translational modification, such 
as glycosylation. The final Golgi compartment is the Trans-Golgi Network (TGN), where both 
manbrane and lumenal proteins are sorted for their final destination. Transport vesiclas destined for 
intracellular compartments, such as the lysosome, bud off the TGN. What remains is a seaetory 
vesicle which contains proteins destined for the plasma membrane, such as recq)tQrs, adhesion 

15 molecules, and ion channels, and secretory proteins, such as h(Hinones, neurotransmitters, and digestive 
^yn^. Seaetory vesicles eventually fuse with the plasma membrane (Glick, B.S. and V. Malhotra 
(1998) Cdl 95:883-889). 

The secretory process can be constitutive or regulated Most cells have a constitutive pathway 
for secretion, whereby vesicles derived from maturation of the TGN require no specific signal to fuse 

2 0 with the plasma m^iibrane. In many cells, such as endocrine cells, digestive cdls, and neurons, veside 

pools derived from the TGN collect in the cytoplasm and do not fuse with the plasma monbrane until 

they are directed to by a specific signal. 

Endocvtosis 

Endocytosis, wherein cdls internalize material from the extracellular environment, is ess^al 
25 for transmission of neuronal, metabolic, and proliferative signals; uptake of many essential nutriaifts; 
arid defense against invading organisms. Most cdls ^bit two forms of endocytosis. The first, 
phagocytosis, is an actin-driven process exemplified in maaophage and neutrophils. Material to be 
endocytosed contacts numerous cdl surface recqstors whidi stimulate the plasma membrane to extend 
and surround the particle, enclosing it in a membrane-bound phagosome. In the manunalian inmiune 

3 0 syst^ IgG-coated particles bind Fc receptors on the surface of phagocytic leukocytes. Activation of 

the Fc T&xptxxs initiates a signal cascade involving src-family cytosolic kinases and the monomeric 
GTP-binding (G) protdn Rho. The resulting actin reorganization leads to phagocytosis of the partide. 
This process is an inqjortant con^x)nent of the humoral inunune response, allowing the processing and 
presentation of bacterial-derived pq)tides to antigen-specific T-lynq>hocytes. 
3 5 The second form of ^ocytosis, pinocytosis, is a more generalized uptake of material from the 
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external milieu. Like phagocytosis, pinocytosis is activated by ligand binding to cell surface recq)tors. 
Activation of individual receptors stimulates an internal response that includes coalescence of the 
recqitor-ligand conq)lexes and formation of dathrin-coated pits. Invagination of the plasma membrane 
at dathrin-coated pits produces an endocytic vesicle within the cdl cytq}lasm. These vesides undergo 
5 homotypic fusion to form an early endosomal (EE) conq)artment. The tubulovesicular EE serves as a 
sorting site for incoming material. ATP-driven proton paraps in the EE membrane lowers the pH of the 
EE lumen (pH 6.3-6.8). The addle environm^ causes many ligands to dissociate from their receptors. 
The recq)tors» along with membrane and other integral membrane protdns, are recycled back to the 
plasma membrane by budding off the tubular extoisions of the EE in recycling vesicles (RV). This 

10 sdective ranoval of recycled con^)onents produces a carrier vesicle containing Ugand and other 
material from the external environment. The carrier vesicle luses with TGN-derived vesicles which 
contain hydrolytic enzymes. The acidic environment of the resulting late endosome (LE) activates the 
hydrolytic enzymes which degrade the ligands and oth^ material. As digestion takes place, the LE 
jfuses with the lysosome where digestion is conq)leted (Mdlman, I. (1996) Annu. Rev. Cdl Dev. Biol. 

15 12:575-625). 

Recycling vesicles may return directly to the plasma membrane. Recq)tors internalized and 
returned directly to the plasma membrane have a turnover rate of 2-3 minutes. Some RVs undergo 
microtubule-directed rdocation to a perinudear site» from which they then return to the plasma 
m^rane. Receptors following this route have a turnover rate of 5-10 minutes. StiU other RVs are 
20 retained within the cdl until an appropriate signal is recdved (Mdlman, supra : and James, D.E. et al. 
(1994) Troids Cdl Biol. 4:120-126). 
Vesicle Formation 

Several steps in the transit of material along the secretory and endocytic pathways require the 
formation of transport vesicles. Spedfically, vesicles form at the transitional endq)lasmic reticulum 

2 5 (tER), the rim of Golgi dsternae» the face of the Trans-Golgi Network (TGN), the plasma m^brane 

(PM), and tubular ext^ions of the endosomes. The process b^ins with the budding of a vesicle out of 
the donor membrane. The membrane-bound vesicle contains protdns to be transported and is 
surrounded by a protective coat made up of protdn subunits recruited from the cytosol. The initial 
budding and coating processes are controlled by a cytosolic ras-like GTP-binding protdn, ADP- 

3 0 ribosylating factor (Arf)» and adapter protdns (AP). Different isoforms of both Arf and AP are 

involved at different sites of budding. Another small G-protdn, dynamin, forms a ring complex around 
the neck of the forming vesicle and may provide the mechanochemical force to accomplish the final step 
of the budding process. The coated vesicle complex is then transported through the cytosol. During the 
transport process, Arf-bound OTP is hydrolyzed to GDP and the coat dissodates from the transport 
35 vesicle (West, M.Aetal. (1997) J. Cdl Biol. 138:1239-1254). Two different classes of coat protdn 
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have also been identified. Clathrin coats form on the TGN and PM surfaces, whereas coatomer or COP 
coats form on the ER and Golgi. COP coats can ftirther be distinguished as COPI, involved in 
retrograde traffic through the Golgi and from the Golgi to the ER, and COPn, involved in anterograde 
traffic from the ER to the Golgi (Mellman, supraV The COP coat consists of two maja* components, a 
G-protdn (Arf or Sar) and coat protoma (coatomff). Coatomer is an equimolar conqjlex of seven 
protons, termed ^ha-, beta-, beta*-, gamma-, ddta-, q)silon- and zeta-COP. (Harter, C. and FT. 
Widand (1998) Proc. NaU. Acad Sci. USA 95:11649-11654.) 
Membrane Fusion 

Transport vesicles undergo homotypic or hrterotypic fusion in the secretory and endocytotic 
pathways. Molecules required for appropriate targeting and fusion of vesicles with thdr target 
membrane include proteins incorpOTated in the vesicle membrane, the target membrane, and proteins 
recruited from the cytosol. During budding of the vesicle from the donor conq^artment, an integral 
membrane protdn, VAMP (vesicle-associated membrane protdn) is incorporated into the vesicle. Soon 
after the vesicle uncoats. a cytosolic prenylated GTP-binding protdn, Rab (a membo- of the Ras 
superfamily), is inserted into the vesicle m^brane. GTP-bound Rab protdns are directed into nascent 
transport vesicles where they interact with VAMP. Following vesicle transport, GTPase activating 
protdns (GAPs) in the target membrane convert Rab proteins to the GDP-bound form. A cytosdic 
protdn, guanine-nucleotide dissodaUon inhibitor (GDI) hdps return GDP-bound Rab protdns to thdr 
nranbrane of origin. Several Rab isoforms have been identified and appear to assodate with specific 
compartments withm the cdl. Rab protdns appear to play a role in mediating the function of a viral 
gene, Rev, which is ess^al for rq)licatiQn of HIV-1, the virus responsible for AIDS (Flavdl, R.A. et 
al. (1996) Proc. Natl. Acad. Sd. USA 93:4421-4424). 

Docking of the transport vesicle with the targd membrane involves the formation of a conq)lex 
betweai the vesicle SNAP recq)tor (v-SNARE), target monhrane (t-) SNAREs, and certain other 
monbrane and cytosolic protdns. Many of these other protdns have been identified although thdr 
exaa functions in the docking complex remain uncertain (Tdlam, J.T. ^ al. (1995) J. Biol. Chem. 
270:5857-63; and Rata, Y. andT.C. Sudhof (1995) J. Biol. Chem. 270:13022-28). N-cthylmaldmide 
s^iti ve factor (NSF) and soluble NSF-attadunoit protdn (a-SNAP and p-SNAP) are two such 
protdns that are conserved from yeast to man and function in most intracdlular membrane fusion 
reactions. Seel represents a family of yeast protdns that function at many difieroit stages in the 
secr^ory pathway induding membrane fusion. Recently, mammalian homologs of Seel , called 
Munc-18 protdns, have been idoitified (Katagiri, H. et al. (1995) J. Biol. Chem. 270:4963-4966; Rata 
aal. supra) . 

The SNARE con^lex involves three SNARE molecules, one in the vesicular membrane and 
two in the target membrane. Synaptotagmin is an int^al membrane protdn in the syn^c vesicle 
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which associates with the t-SNAR£ syntaxin in the docking conqplex. Synaptotagmin binds calcium in 
a complex with negatively charged phospholipids, which allows the cytosolic SNAP protein to displace 
synaptotagmin from syntaxin and ftision to occur. Thus, synaptotagmin is a negative regulator of 
fiision in the neuron (Litfleton, J.T. al. (1993) Cdl 74:1 125-1 134). The most abundant membrane 
5 protein of synaptic vesicles appears to be the glycoprotein synaptophysin. a 38 kDa protdn with four 
transmembrane domains. 

SpeciHcity between a vesicle and its target is derived from the v-SNARE. t-SNAREs, and 
associated proteins involved. Diffo-ent isoforms of SNAREs and Rabs show distinct cdlular and 
subcellular distributions. VAMP-l/synaptobrevin, membrane-anchored syn^tosome-associated 
10 protein of 25 kDa (SNAP-25), syntaxin-1 , Rab3A. Rabl5, and Rab23 are predominantiy expressed in 
the brain and nervous systm. Different syntaxin, VAMP, and Rab protdns are associated with 
distinct subcellular con^artments and their vesicular carriers. 
Nuclear Transpc^t 

Transport of protdns and RN A between the nucleus and the cytoplasm occurs through nuclear 
1 5 pOTe complexes (NPCs). NPC-mediated transport occurs in both directions through the nuclear 
envdope. All nuclear protdns are imported from the cytoplasm, thdr site of synthesis. tRNAand 
mRNA are exported from the nucleus, thdr site of synthesis, to the cytoplasm, their site of function. 
Processing of small nuclear RNAs involves export into the cytoplasm, assembly with protdns and 
modifications such as hypamethylation to produce small nuclear ribonuclear protdns (snRNPs), and 

2 0 subsequent import of the snRNPs back into the nuclais. The assembly of ribosomes requires the initial 

import of ribosomal protdns from the cytoplasm, thdr incorporation with RNA into ribosomal 
subunits. and export back to the cytoplasm (G5rlich, D. and I.W. Mattaj (1996) Sdence 271:1513- 
1518.) 

The transpOTt of protdns and mRN As across the NPC is sdective, depaident on nuclear 
25 localization signals, and generally requires assodation with nudear transport factors. Nuclear 

localization signals (NLS) consist of short stretches of amino adds enriched in basic residues. NLS are 
found on protdns that are targeted to the nudeus, such as the glucocorticoid receptor. The NLS is 
recognized by the NLS recqptor. importin, vMch thai interacts with the monomeric GTP-binding 
protdn Ran. This NLS protdn/receptor/Ran con^plex navigates the nuclear pore with the hdp of the 

3 0 homodimeric protein nudear transport factor 2 (NTF2). NTF2 binds the GDP-bound form of Ran and 

to multiple protdns of the nuclear pore complex containing FXFG repeat motifs, such as p62. 
(Paschal, B. a al. (1997) J. Biol. Chem 272:21534-21539; and Wong, D.H. et al. (1997) Mol. Cdl 
Biol. 17:3755-3767). Some protdns are dissodated before nudear mRNAs are transported across the 
NPC while others are dissodated shortly after nuclear mRNA transport aaoss the NPC and are 
3 5 rdmported into the nucleus. 
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Disease Corrdation 

The etiology of numerous human diseases and disorders can be attributed to defects in the 
transport or secretion of protdns. For example, abnormal hormonal secr^on is linked to disorders 
such as diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease 
5 and goiter (thyroid hormone), and Cushing*s and Addison's diseases (adrenocorticotropic hormone, 
ACTH). Moreover, cancer cells secrete excessive amounts of hormones or other biologically active 
pq)tides. Disorders related to excessive secretion of biologically active pqitides by tumor cells include 
fasting hypoglycemia due to increased insulin season from insulinoma-islet cell tumors; hypertension 
due to inaeased q)inq)hrine and norq)inephrine sealed from pheochromocytomas of the adrenal 

1 0 medulla and synq)athetic paraganglia; and carcinoid syndrome, which is characterized by abdominal 
cramps, diarrhea, and valvular heart disease caused by excessive amounts of vasoactive substances 
such as serotonin, bradykinin, histamine, prostaglandins, and polypq)tide hormones, secreted from 
intestinal tumors. Biologically active peptides that are ectopically synthesized in and secreted from 
tumor cells include ACTH and vasopressin (lung and pancreatic cancers); parathyroid hormone (lung 

15 and bladder cancers); calcitonin (limg and breast cancers); and thyroid-stimulating hormone (medullary 
thyroid carcinoma). Such pq)tides may be useful as diagnostic markers fot tumaigenesis (Schwartz, 
M.Z. (1997) Semia Pediatr. Surg. 3:141-146; and Said, S.I. and G.R. Faloona (1975) N. Engl. J. Med 
293:155-160). 

Defective nuclear transport may play a rde in cancer. The BRCAl protein contains three 

2 0 potential NLSs vftdch interact with importin alpha, and is transported into the nucleus by the 

importin/NPC pathway. In breast cancer cells the BRCAl protdn is aberrantly localized in the 
cytoplasm. Hie mislocation of the BRCAl protein in breast cancer cells may be due to a defect in the 
NFC nuclear mspon pathway (Chen, C.F. et al. (1996) J. Biol. Chem. 271:32863-32868). 

It has been suggested that in some breast cancers, the tumor-suppressing activity of p53 is 
25 inactivated by the sequestrati(m of the protein in the cytoplasm, away from its site of action in the cell 
nucleus. Cytoplasmic wild-type p53 was also found in human cervical carcinoma cell lines. (Moll, 
U.M. et al. (1992) Proc. NaU. Acad Sci. USA 89:7262-7266; and Uang, X.H. ei al. (1993) Oncogene 
8:2645-2652.) 
Environmental Responses 

3 0 Organisms respond to the environment by a number of pathways. Heat shock proteins, 

including hsp 70, hsp60, hsp90, and hsp 40, assist organisms in coping with heat damage to cellular 
proteins. 

Aquaporins (AQP) are channels that transport water and, in some cases, nonionic small solutes 
such as urea and glycerol. Water nmemsat is inipOTtant fcF a number of physiological processes 
3 5 including renal fluid filtration, aqueous humor generation in the ^e, cerebrospmal fluid production in 
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the brain, and appropriate hydration of the lung. Aquaporins are numbers of the major intrinsic protein 
(MIP) family of membrane transporters (King, L.S. and P. Agre (1996) Annu. Rev. Physiol. 58:619- 
648; Ishibashi, K. a al. (1997) J. Biol. Cbsm. 272:20782-20786). The study of aquaporins may have 
relevance to understanding edema formation and fluid balance in both normal physiology and disease 
5 states (King, supra) . Mutations in AQP2 cause autosomal recessive nq)hrogenic diab^es insipidus 
(OMIM ♦107777 Aquaporin 2; AQP2). Reduced AQP4 expression in skdetal muscle may be 
associated with Duchenne muscular dystrophy (Frigeri, A. et al. (1998) J. Clin. Invest 102:695-703). 
Mutations in AQPO cause autosomal dominant cataracts in the mouse (OMIM *154050 Major Intrinsic 
Protein of Lens Fiber; MDP). 

10 The m^lothiondns (MTs) are a group of small (61 amino adds), cystdne-rich protdns that 

bind heavy metals such as cadmium, zinc, mercury, lead, and copper and are thought to play a role in 
metal detoxification cr the metabolism and homeostasis of metals. Arsenite-resistance proteins have 
been identified in hamsters that are resistant to toxic levels of arsenite (Rossman, T.G. et al. (1997) 
Mutat. Res. 386:307-314). 

15 Humans respond to light and odors by specific protein pathways. Proteins involved in light 

perception include rhodopsin, transdudn, and cGMP phosphodiesterase. Proteins involved in odor 
p^eption include multiple olfaaory recq)tors. Other protdns are important in human Circadian 
liiythms and responses to wounds. 
Immunity and Host Defense 

20 All vertebrates have developed sophisticated and complex inunune systems that provide 

protection from viral, bacterial, fungal and parasitic infections. Included in these systons are the 
processes of humoral immunity, the complement cascade and the infiammatory response (Paul, W£. 
(1993) Fundamental Immunoloev. Raven Press, Ltd., New York NY, pp. 1-20). 

The cellular components of the humoral immune system include six different types of 

2 5 leukocytes: monocytes, lymphocytes, polymorphonuclear granulocytes (consisting of neutrophils, 

eosinophils, and basophils) and plasma cells. Additionally, fragments of megakaryocytes, a seventh 
type of white blood cell in the bone marrow, occur in large numbers in the blood as platelets. 

Leukocytes are formed from two stem cell lineages in bone marrow. The myeloid stem cell 
line produces granulocytes and monocytes and, the lymphoid stem cell produces lymphocytes. 
30 Lymphoid cells travel to the thymus, spleen and lymph nodes* where they mature and differentiate 
into lymphocytes. Leukocytes are responsible for defending the body against invading pathogens. 
Neutrophils and monocytes attack invading bacteria, viruses, and other pathogens and destroy them 
by phagocytosis. Monocytes enter tissues and differentiate into macrophages which are extremely 
phagocytic. Lymphocytes and plasma cells are a part of the immune system which recognizes 

3 5 specific foreign molecules and organisms and inactivates them, as well as signals other cells to attack 
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the invaders. 

Granulocytes and monocytes are formed and stored in tbe bone marrow until needed. 
Megalcaryocytes are produced in bone marrow, where they fragment into platelets and are released 
Into the bloodstream. The main function of platelets is to activate the blood clotting mechanism. 
5 Lymphocytes and plasma cells are produced in various lymphogenous organs, including the lymph 
nodes, spleen, thymus, and tonsils. 

Both neutrophils and maaophages exhibit chemotaxis towards sites of inflammation. Tissue 
inflammation in response to pathogen invasion results in production of chemo-attractants for 
leukocytes, such as endotoxins or other bacterial products, prostaglandins, and products of leukocytes 
10 or platelets. \ 

Basophils participate in the release of the chemicals involved in the inflammatory process. 
Hie main function of basophils is secretion of these chemicals to such a degree that they have been 
referred to as "unicellular endocrine glands". A distinct aspea of basophilic secretion is that the 
contents of granules go directly into the extracellular environment, not into vacuoles as occurs with 
15 neutrophils, eosinophils and monocytes. Basophils have receptors for the Fc fragment of 

immunoglobulin E (IgE) that are not present on other leukocytes. Crosslinking of membrane IgE 
with anti-Ig£ or other ligands triggers degranulation. 

Eosinophils are bi- or multi-nucleated white blood cells which contain eosinophilic granules. 
Thdr plasma mmbrane is charact^zed by Ig recq)tt}rs, particularly IgG and IgE. Generally, 
2 0 eosinophils are stored in the bone marrow until recruited for use at a site of inflammation or invasion. 
They have specific functions in parasitic infections and allergic reactions, and are thought to detoxify 
some of the substances released by mast cells and basophils which cause inflammation. AdditionaUy, 
they phagocytize antigen-antibody complexes and further help prevent spread of the inflammation. 
Macrophages are monocytes that have left the blood stream to settle in tissue. Once 

2 5 monocytes have migrated into tissues, they do not re-enter the bloodstream. The mononuclear 

phagocyte system is comprised of precursor cells in the bone marrow, monocytes in circulation, and 
macrophages in tissues. The system is capable of very fast and extensive phagocytosis. A 
macrophage may phagocytize over 100 bacteria, digest them and extrude residues, and then survive 
for many more months. Maaophages are also cq)able of ingesting large particles, including red 
30 blood cells and malarial parasites. Hiey inaease several-fold in size and transform into macrophages 
that aie characteristic of the tissue they have ent^ed, surviving in tissues for several months. 

Mononuclear phagocytes are essential in defending the body against invasion by foreign 
pathogens, particularly intracellular microorganisms such as M. tuberculosis , list^a, leishmania and 
toxoplasma. Macrophages can also control the growth of tumorous cells, via both phagocytosis and 

3 5 secretion of hydrolytic enzymes. Another important function of maaophages is that of processing 
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antigen and presenting them in a biochemically modified fonn to lymphocytes. 

Hie immune system responds to invading miaoorganisms in two major ways: antibody 
production and cell mediated responses. Antibodies are immunoglobulin proteins produced by 
B-lynq>hocytes which bind to specific antigens and cause inactivation or promote destruction of the 
5 antigen by other cells. Cell-mediated immune responses involve T-lymphocytes (T cells) that react 
with todgn antigen on the surface of infected host cdls. Dq)ending on the type of T cell, the 
infected cell is either killed or signals are secreted which activate macrophages and other cells to 
destroy the infected cell (Paul, supra\ 

T-lymphocytes originate in the bone marrow or liv^ in fetuses. Precursor cells migrate via 

10 the blood to the thymus, where they are processed to mature into T-lymphocytes. This processing is 
crucial because of positive and negative selection of T cells that will react with foreign antigen and 
not with self molecules. After processing, T cells continuously circulate in the blood and .'?econdary 
lymphoid tissues, such as lymph nodes, spleen, certain q)ithelium-associated tissues in the 
gastrointestinal tract, respiratory tract and skin. When T-lymphocytes are presented with the 

1 5 complementary antigen, they are stimulated to proliferate and release large numbers of activated T 
cells into the lymph system and the blood system. These activated T cells can survive and circulate 
for several days. At the same time, T memory cells are created, which remain in the lymphoid tissue 
for months or years. Upon subsequent exposure to that specific antigen, these memory cells will 
respond more rapidly and with a stronger response than induced by the original antigen. Hiis creates 

20 an **inununological miemory" that can provide Immunity for years. 

Th^e are two major types of T cells: cytotoxic T cells destroy infected host cells, and helper 
T cells activate oih& white blood cells via chemical signals. One class of helper cell, T|,l , activates 
macrophages to destroy ingested microorganisms, while anoth^, T„2, stimulates the production of 
antibodies by B cells. 

25 Cytotoxic T cells directly attack the infected target cell. In virus-infected cells, peptides 

derived from viral proteins are generated by the proteasome. These pq)tides are transported into the 
ER by the transporter associated with antigen processing (TAP) (Pamer, E. and P. Cresswell (1998) 
Annu. Rev. Immunol. 16:323-358). Once inside the ER, the peptides bind MHC I chains, and the 
peptide/MHC I complex is transported to the cell surface. Receptors on the surface of T cells bind to 

3 0 antigen presented on cell surface MHC molecules. Once activated by binding to antigen, T cells 
secrete yinterf^on, a signal molecule that induces the expression of genes necessary for presenting 
viral (or other) antigens to cytotoxic T cells. Cytotoxic T cells kill the infected cell by stimulating 
programmed cell death. 

Helper T cells constitute up to 75% of the total T cell population. They regulate the immune 

3 5 functions by producing a variety of lymphokines that act on other cells in the immune system and on 
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bone marrow. Among these lymphokines are: interleukins-2,3,4.5,6; granulocyte-monocyte colony 
stimulating factor, and y-interferoa 

Helper T cells are required for most B cells to re^nd to antigen. When an activated helper 
cell contacts a B cell, its centrosome and Golgi apparatus become ori^ted toward the B cell, aiding 
5 the directing of signal molecules, such as transmembrane-bound protdn called CD40 ligand, onto the 
B cell surface to interact with the CD40 transmembrane protein. Seoeted signals also help B cells to 
proliferate and mature and, in some cases, to switch the class of antibody being produced. 

B-lyn?)hocytes (B cells) produce antibodies which react with specific antigenic proteins 
presented by pathogens. Once activated, B cells become filled with extensive rough endoplasmic 

10 reticulum and are known as plasma cells. As with T cells, inta-action of B cells with antigen 

stimulates prolif^ation of only those B cells which produce antibody specific to that antigen. Tha-e 
are five classes of antibodies, known as immunoglobulins^ which together comprise about 20% of 
total plasma protein. Each class mediates a characteristic biological response after antigen binding. 
Upon activation by specific antigen B cells switch from making membrane-bound antibody to 

1 5 secretion of that antibody. 

Antibodies, or immunoglobulins Gg), are the founding members of the Ig superfamily and the 
central components of the humoral immune response. Antibodies are cither expressed on the surface 
of B cells or secreted by B cells into the drculatioa Antibodies bind and neutralize blood-borne 
foreign antigens. The prototypical antibody is atetramer consisting of two identical heavy 

20 polypeptide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by 
disulfide bonds. This arrangement confers the characteristic Y-shq)e to antibody molecules. 
Antibodies are classified based on tiieirH-chain composition. Hie five antibody classes, IgA,IgD. 
IgE, IgG and IgM, are defined by the a. 6, e, y, and \i H-chain types. There are two types of L- 
chains, k and A, dther of which may associate as a pair with any H-chain pair. IgG, the most 

2 5 common class of antibody found in the circulation, is tetrameric, while the otiier classes of antibodies 

are generally variants or multimers of this basic structure. 

H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant 
region. Botii H-chains and L-chains contain rq)ealedlg domains. For example, a typical H-chain 
contains four Ig domains, three of which occur witiiin tiie constant region and one of which occurs 

3 0 witiiin tiie variable region and contributes to tiie formation of tiie antigen recognition site. Likewise, 

a typical L-chain contains two Ig domains, one of which occurs witiiin tiie constant region and one of 
which occurs witiiin tiie variable region. In addition, H chains such as ^ have been shown to 
associate with otiier polypeptides during differentiation of the B cell. 

Antibodies can be described in terms of tiieir two main ftmctional domains. Antigen 
3 5 recognition is mediated by Uie Fab (antigen binding fragment) region of tiie antibody, while effector 
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fiinctions arc mediated by the Fc (crystallizable fragment) regioa Binding of antibody to an antigen, 
such as a bacterium, triggm the destruction of the antigen by phagocytic white blood cells such as 
macr(q)hages and neutrophils. These cdls repress surface receptors that specifically bind to the 
antibody Fc region and allow the phagocytic cells to aigulf, ingest, and degrade the antibody-bound 
5 antigen. The Fc receptors expressed by phagocytic cdls are single-pass transmembrane glycoprotdns 
of about 300 to 400 andno adds (Sears, D.W.ctal. (1990) J. Immunol. 144:371-378). The 
extracellular portion of the Fc receptor typically contains two or three Ig domains. 

Diseases which cause over- or under-abundance of any one type of leukocyte usually result in 
the entire immune defense system becoming involved. A wdl-known autoimmune disease is AIDS 

1 0 (Acquired Immunodeficiency Syndrome) wh^e the number of helper T cells is depleted, leaving the 
patient susceptible to infection by microorganisms and parasites. Another widespread medical 
condition attributable to the immune system is that of allergic reactions to catain antigens. Allergic 
reactions include: hay fevo", asthma, anaphylaxis, and urticaria (hives). Leukemias are an excess 
production of white blood cells, to the point where a major portion of the body' s metabolic resources 

15 are directed solely at proliferation of white blood cells, leaving other tissues to starve. Leukopenia or 
agranulocytosis occurs when the bone marrow stops producing white blood cdls. Tliis leaves the 
body unprotected against foreign microorganisms, including those which normally inhabit skin, 
mucous membranes, and gastrointestinal tract If all white blood cell production stq)s completely, 
infection will occur within two days and death may follow only 1 to 4 days later. 

20 Impaired phagocytosis occurs in several diseases, inchiding monocytic leukemia, systemic 

lupus, and granulomatous disease. In such a situation, macrophages can phagocytize noimally, but 
the enveloped organism is not killed. A defect in the plasma membrane enzyme which converts 
oxygra to lethally reactive forms results in abscess formation in liver, lungs, spleen, lymph nodes, 
and beneath the skin. Eosinophilia is an excess of eosinophils commonly observed in patients witii 

2 5 allergies (hay fever, asthma), allergic reactions to drugs, rheumatoid arthritis, and cancen (Hodgkin's 

disease, lung, and Uver cancer) Gsselbacher, K. J. et al. (1994) Harrison's Pri nciples of Internal 
Meditine. McGraw-Hill, Inc., New York NY). 

Host defense is fvi\h& augmented by the complement system. The complement system 
s^es as an effector system and is involved in infectious agent recognitioit It can function as an 

3 0 indep^ident immune network or in conjunction with other humoral immune responses. The 

complement system is comprised of num^ous plasma and membrane protdns that act in a cascade of 
reaction sequences whCTeby one component activates the next. The result is a rapid and amplified 
response to infection through either an inflammatory response or increased phagocytosis. 

The complement system has more than 30 protein components which can be divided into 
35 fimctional groupings including modified serine proteases, membrane-binding proteins and regulators 
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Of complement activation. Activation occurs through two different pathways the classical and the 
altoTiati ve. Both pathways serve to destroy infectious agents through distinct triggering mechanisms 
that eventually merge with the involvement of the component C3. 

The classical pathway requires antibody binding to infectious agent antigens. Hie antibodies 
5 serve to define the target and initiate the complement system cascade, culminating in the destruction 
of the infectious agent In this pathway, since the antibody guides initiation of the process, the 
complement can be seen as an effeaor arm of the humoral immune system. 

The altemative pathway of the complement system does not require the presence of pre- 
existing antibodies for targeting infectious agent destruction. Rather, tiiis patiiway, through low 
10 levels of an activated component, remains constantiy primed and provides surveillance in the non- 
immune host to liable targeting and destruction of infectious agents. In this case foreign material 
triggos the cascade, thereby facHitating phagocytosis or lysis (Paul, supra, pp.918-919). 

Anotho* important component of host defense is the process of inflammation. Inflammatory 
responses are divided into four categories on the basis of pathology and include allergic 
1 5 inflammation, cytotoxic antibody mediated inflanmiation, immime complex mediated inflammation 
and monocyte mediated inflammation. Inflammation manifests as a combination of each of these 
forms with one predominating. 

All^gic acute inflammation is observed in individuals wherein spedflc antigens stimulate 
IgE antibody production. Mast cells and basophils are subsequentiy activated by tiie attachmrat of 
2 0 antigoi-IgE complexes, resulting in the release of cytoplasmic granule contents such as histamine. 
The products of activated mast cells can increase vascular permeability and constria the smootii 
muscle of breathing passages, resulting in an^hylaxis or asthma. Acute inflammation is also 
mediated by cytotoxic antibodies and can result in the destruction of tissue tiirough tiie binding of 
complement-fixing antibodies to ceDs. The responsible antibodies are of the IgG or IgM types. 

2 5 Resultant chnical disorders include autoimmune hemolytic anemia and thrombocytopenia as 

associated with systemic lupus erythemalosis. 

Immune complex mediated acute inflammation involves the IgG or IgM antibody types 
which combine with antigen to activate the complement cascade. When such immune complexes 
bind to neutrophils and macrophages they activate tiie respiratory burst to form protein- and vessel- 

3 0 damaging agents such as hydrogen peroxide, hydroxyl radical, hypochlorous acid, and chloramines. 

Clinical manifestations include rheumatoid arthritis and systemic lupus erytiiematosus. 

In chronic inflammation or delayed-type hypersensitivity, macrophages are activated and 
process antigen for presentation to T cells tiiat subsequentiy produce lymphokines and monokines. 
This type of inflammatory response is likely important for defense against intracellular parasites and 
3 5 certain viruses. Clinical associations include, granulomatous disease, tuberculosis, leprosy, and 
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sarcoidosis (Paul, W.E., supra, pp.1017-1018). 

Extracdlular Information Transmission Molecules 

Intercellular communication is essential for the growth and survival of multicellular 
5 organisms, and in paiticular, for the function of the endocrine, nervous, and immune systems. In 
addition, intercellular communication is critical for developmental processes such as tissue 
construction and organogenesis, in which cell proliferation, cell differentiation, and morphogenesis 
must be spatially and temporaUy regulated in a precise and coordinated manner. Cells commimicate 
with one another through the secretion and uptake of diverse types of signaling molecules such as 
10 hormones, growth factors. neuropq>tides, and cytokines. 
Hormones 

Hormones are signaling molecules that coordinately regulate basic physiological processes 
from embryogenesis throughout adulthood These processes include m^bolism, respiration, 
reproduction, exaction, fetal tissue differentiation and organogenesis, growth and developmwit, 

1 5 homeostasis, and the stress response. Hormonal secretions and the nervous system are tightly 

integrated and int£a:dq)end^t. Hormones are secreted by endocrine glands, primarily the hypothalamus 
and pituitary, the thyroid and parathyroid, the pancreas, the adrenal glands, and the ovaries and testes. 

The secretion of hormones into the circulation is tightly controlled. Hormones are oft^ 
secreted in diurnal, pulsatile, and cyclic patterns. Hormone secretion is regulated by perturbations in 

2 0 blood biochemistry* by other upstream-acting hormones, by neural inqsulses, and by negative feedback 
loops. Blood hormone concoitrations are constantly nionitored and adjusted to niaintaino^ 
steady-state levds. Once secreted, hormones act only on those target cells that express specific 
receptors. 

Most disorders of the endocrine system are caused by either hyposecr^on hypersecr^on of 

2 5 hormones. Hyposecredon often occurs wh^ a hormone's gland of origin is damaged or otherwise 

inq)aired. Hypersecretion oft^ results from the proliferation of tumors derived from hormone-secredng 
cdls. Inappropriate hormone levels may also be caused by defects in regulatory feedback \oops or in 
the processing of hormone precursors. Endocrine malfunction may also occur when the target cell fails 
to respond to the hormone. 

3 0 Hormones can be classified biochendcally as polypq)ddes, sterdds, dcosanoids, or amines. 

Polypeptides, which include diverse hormones such as insulin and growth hormone, vary in size and 
function and are ohen synthesized as inactive precursors that are processed intracdlularly into mature, 
active forms. Amines, v/t)xch include qpinephrine and dopamine, are amino add derivatives that 
function in neuroendocrine signaling. Steroids, wluch include the cholesterol-derived hornK>nes 
35 estrog^ and testosterone, function in sexual devdopment and reproduction. Eicosanoids, which 

50 



WOOD/73509 PCT/USOO/15404 

include prostaglandiiis and prostacydinSt are fatty add derivatives that function in a vari^y of 
processes. Most polypq>tides and some aniines are soluble in tbe circulation v/hext tbey are liigHly 
susceptible to proteolytic d^adation within seconds after thdr secretioa Staoids and lipids are 
insoluble and must be transported in the circulation by carrier protdns. The following discussion will 
5 focus primarily on polypeptide hormones. 

Hormones secr^ed by the hypothalamus and pituitary gland play a critical role in endoaine 
function by coordinately regulating hormonal secretions from other endocrine glands in response to 
neural signals. Hypothalamic hormones include thyrotrq)in-rdeasing hormone, gonadotropin-rdeasing 
h(xmone» somatostatin, growth-h^mone rdeasing factor* corticotropin-rdeasing harmone, substance P» 

1 0 dopamine, and prolactin-rdeasing hormone. These hormones directly regulate the seaetion of 

hormones from the anterior lobe of the pituitary. HcH'mones secreted by the anterior pituitary include 
adrenocorticoU'opic hormone (ACTH), mdanocyte-stiraulating hormone, somatotropic hormones such 
as growth hcHinone and prolactin, glycoprotein hormones such as thyroid-stimulating hormone, 
lutdnizing hormone (LH), and follicle-stimulating hormone (FSH), P-lipotropin, and P-endorphins. 

1 5 These hormones regulate hormonal secretions from the thyroid, panaeas, and adrwial glands, and act 
directly on the rq)roductive organs to stimulate ovulation and spermatog^iesis. The posterior pituitary 
synthesizes and secretes antidiurdic hormone (ADH, vasopressin) and oxytodn. 

DiscH-do-s of the hypothalamus and pituitary ofl^ result from lesions such as primary brain 
tumcH-s, adenomas, infarction associated with pregnancy, hyppphysectomy, aneurysms, vascular 

2 0 malfonnations, thrombosis, infections, immunological disorders, and conq)lications due to head trauma. 
Such disorders have profound effects on the function of other endoaine glands. Discvders assodated 
with hypopituitarism include hypogonadism, Sheehan syndrome, diabetes insipidus, Kallman's disease, 
Hand-SchuUer-Christian disease, Letterer-Siwe disease, sarcoidosis, empty sdla syndrome, and 
dwarfism. Disorders assodated with hyperpituitarism include acromegaly, giandsm, and syndrome of 

2 5 inapprq)riate ADH seaetion (SIADH), oiten caused by benign ad^omas. 

Hmnones secreted by the thyrdd and parathyroid primarily control meubolic rates and the 
r^ation of serum calcium levds, respectivdy. Thyroid hcvmones include calcitonin, somatostatin, 
and thyroid hormone. Hie parathyroid seaetes parathyroid honnone. Disorders assodated with 
hypothyroidism include goiter, myxedema, acute thyroiditis assodated with bacterial infection, 

3 0 subacute thyroiditis assodated with viral infection, autoimmune thyrdditis (Hashimoto's disease), and 

cretinism. Dist^ders assodated with hyperthyroidism include thyrotoxicosis and its various forms, 
Grave's disease, pretibial myxedona, toxic multinodular goiter, thyroid cardnoma, and Plummer's 
disease. Disordas assodated with hypeq)arathyroidism include Conn disease (dironic hypercalemia) 
leading to bone resorption and parathyroid hyperplasia. 
3 5 Hormones secrded by the pancreas regulate blood glucose levds by modulating the rates of 
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carbohydrate, fat, and protein metabdisia Pancreatic hormones include insulin, glucagon, amylin, y- 
aminobutyric add, gastrin, soniatostatin, and pancreatic polypeptide. The principal disorder associated 
with pancreatic dysfunction is diabetes mdlltus caused by insufQcient insulin activity. Diabetes 
mdlitus is genffaUy classified as either Type I (insulin-dq)endait, juvoiile diabetes) ot Type II (non- 
insulin-dq)endent. adult diabetes). The treatment of both forms by insulin rq)lacement therapy is wdl 
known. Diabetes mdlitus often leads to acute complications sudi as hypoglycemia (insulin shock), 
coma, diabetic ketoacidosis, lactic acidosis, and chronic conq)llcations leading to disorders of the eye, 
kidney, skin, bone» joint, cardiovascular system, nervous system, and to decreased resistance to 
infection. 

The anatomy, physiology, and diseases rdated to hmnonal function are reviewed in McCance, 
K.L. and S.E. Huether (1994) Pathophysiology: The Biological Basis for Disease in Adults and 
Children. Mosby-Year Book. hic.. St. Louis MO; Greenspan. F.S. and J.D. Baxter (1994 VBasic and 
Clinical Endocrinolopv . Appleton and Lange, East Norwalk CT. 
Growth Factors 

Growth factors are secreted proteins that mediate intercdlular communicatioa Unlike 
hormones, which travel great distances via the circulatory system, most growth factors are primarily 
local mediators that aa on neaghbcMing cdls. Most growth factors contain a hydrophobic N-terminal 
signal peptide sequence which directs the growth factor into the seaetory pathway. Most growth 
factors also undergo post-translational modifications within the seaetory pathway. These 
modifications can include proteolysis, glycosylation, phosphorylation, and intramolecular disulfide bond 
formation. Once seCTeled, growth factors bind to specific recq)tors on the surfaces of ndghboring 
target cells, and the bound receptors trigger intraceillular signal transduction pathways. These signal 
transduction pathways elicit specific cellular responses in the target cells. These responses can include 
the modulation of gaie expression and the stimulation or inhibition of cdl division, ceJl differentiation, 
and cell motility. 

Growth factors fall into at least two broad and overlapping classes. The te'oadest class 
includes the large polypq)tide growth factors, which are wide-ranging in their effects. These factors 
include qjidermal growth factor (EGF), fibroblast growth factor (FGF), transforming growth factor-P 
(TGF-P), insulin-like growth factOT (IGF), nerve growth factor (NGF), and platelet-derived growth 
factor (PDGF). each defining a family of numerous related factors. The large polypqptide growth 
factors, with the exception of NGF, aa as mitpgens on diverse cdl types to stimulate wound healing, 
bone synthesis and remodding, extracellular matrix synthesis, and prolifCTation of q)ithdia], 
q)idermal, and connective tissues. Members of the TGF-P, EGF, and FGF families also function as 
inductive signals in the differentiation of embryonic tissue. NGF functions specifically as a 
noirotrophic factor, promoting neuronal growth and differentiation. 
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Another dass of growth factors includes the hematopoietic growth f actors* which are nanow in 
their target specificity. Tliese factws stimulate the proliferation and differentiation of blood cdls such 
as B->lymphocytes» T-lynqihocytes, erythrocytes, platdets, eosinophils, basophils, neutrophils, 
macrophages, and their stem cdl precursors. These factors include the colony-stimulating factors (G- 
5 CSF, M-CSF, GM-CSF, and CSFl-3), erythrppdetin, and the cytokines. The cytokines are specialized 
hematq)oietic factors secreted by cdls of the immune system and are discussed in d^ail bdow. 

Growth factors play critical roles in neoplastic transformation of cells in vitro and in tumor 
progression in vivo . Overexpression of the large polypeptide growth factors promotes the 
proliferation and transformation of cells in culture. Inappropriate expression of these growth factors 

10 by tumor cells in vivo may contribute to tumor vascularization and metastasis. Inappropriate activity 
of hematopoietic growth factors can result in anemias, leukemias, and lymphomas. Moreover, growth 
factors are both structurally and functionally related to oncoproteins, the potentially cancer^causing 
products of proto-oncogenes. Certain FGF and PDGF family members are themselves homologous to 
oncoproteins, whereas receptors for some members of the EGF, NGF. and FGF families are encoded 

15 by proto-oncogenes. Growth factors also affect the transcriptional regulation of both proto-oncogenes 
and oncosuppressor genes (Pimentel, E. (1994) Handbook of Growth Factors . CRC Press, Ann Arbor 
MI; McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach . Oxford University 
Press, New York NY; Habenicht, A., ed. (1990) Growth Factors. Differentiation Factors, and 
Cvtokines . Springer- Veriag, New York NY). 

20 In addition, some of the large polypq^tide growth factors play crucial roles in the induction of 

the primordial germ layers in the devdoping enibryo. This induction ultimatdy results in the formation 
of the mbryonic mesoderm, ectoderm, and endoderm which in turn provide the framework for the 
oitire adult body plaa Disruption of this inductive process would be catastrq)hic to ^ryonic 
devdopmaxt. 

25 Small Peptide Faaors - Neuropeptides and Vasomediators 

Neuropeptides and vasomediators (NP/VM) comprise a family of smaU peptide factors, 
typically of 20 amino adds or less. Hiese factors generally function in neuronal exdtation and 
inhibition of vasoconstriction/vasodilation, musde contraction, and hormonal secretions from the 
brain and other endocrine tissues. Induded in this family are nwopq)tides and neurq)eptide 

30 hormones such as bcnnbesin, neuropeptide Y, neurotensin, neuromedin N, mdanocortins, opioids, 
galanin, somatostatin, tachykinins, urot^in II and rdated pq}tides uivolved in smooth musde 
stimulation, vasopressin, vasoactive intestinal peptide, and drculatCHy system-bome signaling 
molecules such as angiotensin, conq)lem^t, calcitonin, endothdins, fcnmyl-meduonyl pq)tides, 
glucagon, cholecystokinin, gastrin, and many of the pq)tide honnones discussed above. NPA^s can 

3 5 transduce signals directly, modulate the activity or rdease of other neurotransmitters and hmnones, and 
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act as catalytic enzymes in signaling cascades. The effects of NPA^s range from extremdy brief to 
long-lasting. (Reviewed in Martin, C.R. et al. (1985) Endocrine Physiology, Oxford University Press, 
NewYa:kNY,pp. 57-62.) 
Cytokines 

5 Cytokines conpise a family of signaling molecules that modulate the immune system and the 

inflammatory response. Cytokines are usually secreted by leukocytes, or white blood cells, in response 
to injury or infection. Cytokines function as growth and differentiation factors that act primarily on 
cells of the inunune system such as B- and T-lynq)hocytes, monocytes, macrophages, and granulocytes. 
Like other signaling molecules, cytokines bind to specific plasma membrane recq)tors and trigger 

10 intracellular signal transduction pathways which alter gene expression patterns. There is considerable 
potential for the use of cytokines in the treatment of inflammation and inmiune system disorders. 

Cytokine structure and function have been extensivdy characterized in vitro . Mc«t cytokines 
are small polypq)tides of about 30 kilodaltons or less. Over 50 cytokines have been identified from 
human and rodent sources. Exanqjles of cytokine subfamilies include the interfo-ons (IFN-a, -P, and - 

15 y), the intCTleukins (IL1-IL13), the tumor neaosis factors (TNF-a and -p), and the chamokines. Many 
cytokines have been produced using recombinant DNA techniques, and the activities of individual 
cytokines have been determined in vitro . These activities include regulation of leukocyte proliferation, 
differentiation, and motility. 

The activity of an individual cytokine in vitro may not reflect the flill scope of that cytokine's 

2 0 activity in vivo . Cytokines are not expressed individually in vivo but are instead expressed in 
combination with a multitude of other cytokines v^iien the organism is challenged with a stimulus. 
Together, these cytokines collectively modulate the inunune response in a manner appropriate for that 
particular stimulus. Therefore, the physiological activity of a cytokine is d^ermined by the stimulus 
itself and by conq)lex interacdve n^works among co-expressed cytokines which may deuK^trate both 

2 5 synergistic and antagonistic relationships. 

Chemokines conq)rise a cytokine subfamily with over 30 members. (Reviewed in Wdls, T. 
N.C. and M.C. Pdtsdi (1997) J. Leukoc. Biol. 61:545-550.) Chemokines were initially identified as 
chmotactic proteins that recruit monocytes and macrophages to sites of inilammatioa Recent evidoice 
hidicates that chemokines may also play key roles in honatopoiesis and HrV-1 infection. Chemokines 
30 are snnall proteins whichrangefrom about 6-15 kilodaltons in molecular weight. Chemokines are 
furtiier classified as C, CC, CXC. or CXjC based on tiie nuniber and position of critical cysteine 
residues. The CC chemokines, for example, each contain a conserved motif consisting of two 
consecutive cystdnes followed by two additional cysteines >^ch occur downstream at 24- and 1 6- 
residue intervals, respectivdy (ExPASy PROSITE database, documents PS00472 and PD(X:00434). 

3 5 The presence and spacuig of these four cysteine residues are highly conserved, whereas the interv^ng 
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residues diverge significantly. However, a conserved tyrosine located about 15 residues downstream of 
the cysteine double seems to be important for chemotactic activity. Most of the human genes encoding 
CC chemokines are clustered on chromosome 17» although there are a few exanq)les of CC chemokine 
genes that map dsewhere. Other chemokines include lymphotactin (C chemokine); macrophage 
5 chemotactic and activating factor (MC AF/MCP- 1 ; CC chemokine) ; platelet factor 4 and IL-8 (CXC 
chemokines); and firactalkine and neurotractin (O^C diemokines). (Reviewed in Luster, A.D. (1998) 
N. Engl. J. Med. 338:436-445.) 

Receptor Molecules 

10 SEQ ID N0:6 and SEQ ID N0:7 encode, for example, receptor molecules. 

The term receptor describes proteins that specifically recognize other molecules. The category 
is broad and includes proteins with a variety of functions. The bulk of recq)tors are cell surface 
proteins which bind extracellular ligands and produce cellular responses in the areas of growth, 
differentiation, endocytosis, and immune response. Other receptors fadhtate the selective transport of 

1 5 proteins out of the endoplasmic reticulum and localize enzymes to particular locations in the cdl. The 
term may also be applied to proteins which act as recq)tQrs for ligands with known or unknown 
chonical conq)osition and which interact with other cdhilar components. For example, the steroid 
hormone receptors bind to and regulate transcription of DNA. 

Relation of cdl proliferation, differentiation, and migration is important for the formation 

20 and function of tissues. Regulatory proteins such as growth faaors coordinatdy control these cdlular 
processes and act as mediators in cdl-cdl signaling pathways. Growth factors are secreted protdns 
that bind to spedfic cdl-surface receptors on target cdls. The bound receptors trigger intracdlular 
signal transduction pathways which activate various downstream effectcn-s that regulate gene 
expression, cdl division, cdl differentiation, cdl motility, and other cdlular processes. 

2 5 Cdl surface recq)tors are typically integral plasma membrane protdns. These receptors 

recognize honnones such as catecholamines; pqstide hormones; growth and differentiation factors; 
small pqjtide factors such as thyrotropin-rdeasing hormone; galanin, somatostatin, and tachykinins; 
and circulatory system-borne signaling molecules. Cdl surface receptors on imnmine system cdls 
recognize antigens, antibodies, and major histocompatibility complex (MHC)-bound pq)tides. Other 

30 cdl surface receptors bind ligands to be internalized by the cdl. This rec^tor-mediated endocytosis 
functions in the uptake of low d^ity lipcprmdns (LDL), transferrin, glucose- or mannose-terminal 
glycoprotdns, galaaose-terminal glycoprotdns, immunoglobulins, phosphovitdlogenins, fibrin, 
protdnase-inhibitor conq}lexes, plasminogen activators, and thrombospondin (Lodish, H. et al. (1995) 
Molecular Cdl Biology . Sd^tific American Books, New York NY, p. 723; Mikhailenko, 1. et al. 

35 (1997) J, Biol. Chan. 272:6784-6791). 
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Receptor Protdn Kinases 

Many growth factOT receptors, including recq)tars for epidermal growth factor, 
platelet-derived growth factor, fibroblast growth factOT, as wdl as the growth modulator a-throrabin, 
contain intrinsic protein kinase activities. When growth factOT binds to the recqjtOT, it triggers the 
5 autophosphOTylation of a serine, threonine, ot tyrosine residue on the recepiar. These phosphorylated 
sites are recognition sites fOT the binding of other cytoplasmic signaling protdns. These proteins 
participate in signahng pathways that ev^tually link the initial recq}tor activation at the cell surface to 
the activation of a specific intracellular target molecule. In the case of tyrosine residue 
autophosphorylation, these signaling protans contain a common domain referred to as a Src homology 

10 (SH) domaia SH2 domains and SH3 domains are found in phospholipase C-y, PI-3-K p85 regulatory 
subunit, Ras-GTPase activating protdn, andpp60"^ (Lowenstein. E.J. et al. (1992) Cdl 70:431-442). 
The cytokine family of recqrtors share a different common binding domain and include transmembrane 
receptOTS for growth hCHinone (GH), intarleukins, erythropoietin, and prolactia 

Other receptOTS and second messenger-binding protdns have intrinsic serine/threonine protdn 

1 5 kinase activity. These include activin/TGF-p/BMP-superfamily recq)tors. calcium- and diacylglycaol- 
activated/phospholipid-dq)endant protdn kinase (PK-C), and RNA-dependant protein kinase (PK-R). 
In addition, other serine/threonine protdn kinases, including nematode Twitchin, have fibronectin-like, 
immunoglobulin C2-like domains. 
G-Protdn Coupled ReceptOTS 

2 0 G-protdn coupled recq)tOTS (GPCRs) are integral m^brane protdns characterized by the 

presence of sevoi hydrophobic transm^nbrane domains which span the plasma membrane and form a 
bundle of antiparalld alpha (a) hdices. These protdns range in size from under 400 to over 1000 
amino adds (Strosberg, A.D. (1991) Eur. J. Biochem. 196:1-10; CoughUn. S.R. (1994) Curr. Opia 
Cdl Biol. 6:191-197). The amino-terrainus of the GPCR is extracdlular, of variable laigth and often 

2 5 glycosylated; the cartx)xy-termimis is cytoplasmic and goierally phosphorylated, Extracdlular loops of 
the GPCR alternate with intracellular loc^s and link the transmembrane domains. The most conserved 
domains of GPCRs are the transmembrane domains and the first two cytoplasmic loops. The 
transmembrane domains account fOT structural and functional features of the Tocepux, In most cases, 
the bundle of a hdices forms a binding pocket. In addition, the extracdlular N-temrunal segm^t ot one 

30 OT mOTe of the three extracdlular loops may also partidpate in ligand binding. Ligand binding activates 
the T&xpior by inducing a conformational change in intracdlular portions of the receptOT. The 
activated recepiac, in turn, interacts with an intracdlular heterotrimeric guanine nucleotide binding (G) 
protdn complex which mediates further intracdlular signaling activities, generally the production of 
second messengers such as cyclic AMP (cAMP), phospholipase C, inositol triphosphate^ ot interactions 

35 with ion channd protdns (Baldwin, J.M. (1994) Curr. Opia Cdl Biol. 6:180-190). 
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GPCRs include those for acetyldioline» adenosine, epinephrine and norepinephrine* bombesin, 
bradykinin, chemokines, dopamine, endothdin, y-aminobutyric add (GABA), follicle-stinnilating 
hormone (FSH), glutamate, gonadotropin-rdeasing hormone (GnRH), hq)atocyte growth factor, 
histamine, leukotriaies, mdanocortins, neuropeptide Y, opioid pq)tides, opsins, prostanoids, serotonin, 
somatostatin, tachykinins, thrombin, thyrotrq)in-releasing hmnone (TRH), vasoactive intestinal 
polypq)tide family, vasopressin and oxytocin, and orphian recq)tors. 

GPCR mutations, which may cause loss of function or constitutive activation, have been 
associated with numerous human diseases (Coughlin, supra) . For instance, retinitis pigmentosa may 
arise from mutations in the rhodopsin goie. Rhodopsin is the retinal photorecq)tor which is located 
within the discs of the eye rod cdl. Parma, J. et al. (1993, Nature 365:649-651) repon that somatic 
activating mutations in the thyrotropin i&xspior cause hypolunctioning thyroid adenomas and suggest 
that certain GPCRs suscq)tible to constitutive activation may behave as protooncogenes. 
Nuclear Receptcys 

Nuclear recq>tors bind small molecules such as hormones or second mess^gers, leading to 
increased receptOT-binding affinity to specific chromosomal DNA elements. In addition the aftlnity for 
other nuclear proteins may also be altered. Such binding and protein-protein interactions may regulate 
and modulate g«ie expression. Examples of such recqjtors include the steroid hormone receptors 
family, the r^noic acid recqjtors family, and the thyroid hormone recq)tors family. 
Ligand-Gated Receptor Ion Channels 

ligand-gated receptor ion channeis fall into two categories. The first categca-y, extracdlular 
ligand-gated rec^tor ion diannds (ELGs), rapidly transduce neurotransmitter-binding events into 
dectrical signals, such as fast synq)tic nou-otransmission. ELG function is regulated by post- 
translational modification. The second category, intracdlular ligand-gated recq}tor ion channds 
(ILGs), are activated by many intracdlular second messengers and do not require post-translational 
modification(s) to effect a channd-opening response. 

ELGs dq)olarize excitable cdls to the threshold of action potential generatioa In non-excitable 
cdls, ELGs permit a limited calcium ion-influx during the presence of agonist. ELGs include channds 
directly gated by neurotransmitters sudi as acetyldioline, L-glutamate, glydne, ATP, serotonin, 
GABA, and histamine. ELG g^ies encode protdns having strong structural and functional similarities. 
ILGs are aicoded by distinct and unrdated goie families and include recq)tors for cAMP, cGNfP, 
calcium ions, ATP, and m^abolites of arachidonic acid 
Macrophage Scavenger Receptors 

Maar(5)hage scavenger receptors with broad ligand spedfidty may participate in the binding of 
low daisity lipoprotdns (LDL) and fordgn antigens. Scavenger receptors types I and II are trimeric 
membrane proteins with each subunit containing a small N-terminal intracelhilar domain, a 
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transmenobrane domaiiu a large extracellular domaia and a C-terminal cystdne-rich domaia Hie 
extracellular domain contains a short spacer domain, an a-hdical coiled-coil domain, and a triple helical 
collag^ms domaia These recq>tQrs have been shown to bind a spectrum of ligands, including 
chemically modified lipoprotdns and albumin, polyribonucleotides, polysaccharides, phospholipids, and 
5 asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; Elomaa, O. et al. 
(1995) Cdl 80:603-609). The scavenger recq)tors are thought to play a key role in atherogaiesis by 
mediating uptake of modified LDL in arterial walls, and in host defense by binding bacterial 
endotoxins, bacteria, and protozoa. 
T-Cell Receptors 

10 T cells play a dual role in the immune system as effeaors and regulators, coupling antigen 

recognition with the transmission of signals tliat induce cell death in infected cells and stimulate 
proliferation of otha- immune cdls. Although a population of T cells can recognize a wide range of 
different antigens, an individual T cell can only recognize a single antigen and only when it is presaited 
to the T cell recq^tor (TCR) as a peptide conq)lexed with a major histocompatibility molecule (MHC) 

15 on the surface of an anligwi presenting cell. The TCR on most T cells consists of immunoglobulin-like 
integral membrane glycqiroteins containing two polypq)tide subunits, a and p, of similar molecular 
weight. Both TCR subunits have an extracellular domain containing both variable and constant 
regions, a transmembrane domain that travases the membrane once, and a short intracellular domain 
(Saito. H. et al. (1984) Nature 309:757-762). The goies for the TCR subunits are constructed through 

2 0 somatic rearrangement of different gene segments. Interaction of antigen in the proper MHC context 
with the TCR initiates signaling cascades that induce the proliferation, maturation, and iiinction of 
cellular components of the immune system (Weiss, A (1991) Annu. Rev. Genet. 25:487-510). 
Rearrangements in TCR genes and alterations in TCR expression have be^ noted in lyiiQ)homas, 
leukemias, autoinmmne disorders, and immunodeficiency disorders (Aisenberg, A.C. et al. (1985) N. 

25 Engl. J. Med. 313:529-533; Wdss. sunraV 

Intracellular Signalii^ Molecules 

SEQ ID N0:8, SEQ ID N0:9, SEQ ID NO:10, SEQ ID N0:1 1, and SEQ ID N0:12 encode, 
for example, intracellular signaling molecules. 

30 Intracellular signaling is the geno^ process by which cells respond to extracellular signals 

(hormones, neurotransmitters, growth and differentiation factors, etc.) through a cascade of 
biochemical reactions that begins with the binding of a signaling molecule to a cell membrane 
receptor and ends with the activation of an intracellular target molecule. Intermediate steps in the 
process involve the activation of various cytoplasmic proteins by phosphorylation via protein kinases, 

35 and their deactivation by protein phosphatases, and the eventual translocation of some of these 
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activated proteins to the cell nucleus where the transcription of specific genes is triggered. Hie 
intracellular signaling process regulates all types of cell functions including cell proliferation, cdl 
differentiation, and gene transcription^ and involves a diversity of molecules including protein kinases 
and phosphatases, and second messenger molecules, sudi as cyclic nucleotides, calcium-calmodulin, 
inositol, and various mitogens, that regulate protein phosphorylation 
Protein Phosphorylation 

Protein kinases and phosphatases play a key role in the intracellular signaling process by 
controlling the phosphorylation and activation of various signaling proteins. The high energy 
phosphate for this reaction is generally transferred from the adenosine triphosphate molecule (ATP) to 
a particular protein by a protein kinase and removed from that protein by a protein phosphatase. 
Protein kinases are roughly divided into two groups: those that phosphorylate tyrosine residues 
(protein tyrosine kinases. PTK) and those that phosphorylate serine or threonine residues 
(serine/threonine kinases, STK). A few protein kinases have dual specificity for serine/threonine and 
tyrosine residues. Almost all kinases contain a consoved 250-300 amino acid catalytic domain 
containing specific residues and sequrace motifs characteristic of the kinase family (Hardic, G. and S. 
Hanks (1995) The Protein Kinase Facts Books. Vol 1:7-20. Academic Press, San Diego CA). 

STKs include the second messenger dependent protein kinases such as the cyclic- AMP 
dq)endent protein kinases (PKA), involved in mediating hormone^induced cellular responses; 
calcium-calmodulin (CaM) dependent protein kinases, involved in regulation of smooth muscle 
contraction, glycogen breakdown, and neurotransmission; and the mitogen-activated protein kinases 
(MAP) which mediate signal transduction from the cell surface to the nucleus via phosphorylation 
cascades. Altered PKA expression is implicated in a variety of disordm and diseases including 
cancer, thyroid disorders, diabetes, atho-oscl^osis, and cardiovascular disease (Isselbach^, K.J. et al. 
(1994) Harrison's Principles of Internal Medicine McGraw-Hill, New York NY. pp. 416-431, 1887). 

PTKs are divided into transmembrane, receptor PTKs and nontransmembrane, non-receptor 
PTKs. Transmembrane PTKs are receptors for most growth factors. Non-receptor PTKs lack 
transmembrane regions and, instead, form complexes with the intracellular regions of cell surface 
receptors. Receptors that function through non-receptor PTKs include those for cytokines and 
hormones (growth hormone and prolactin) and antigen-specific receptors on T and B lymphocytes. 
Many of these PTKs were first identified as the products of mutant oncogenes in cancer cells in which 
their activation was no longer subject to normal cellular controls. In fact, about one third of the 
known oncogenes encode PTKs. and it is well known that cellular transformation (oncogenesis) is 
often accompanied by increased tyrosine phosphorylation activity (Charbonneau, H. and N.K. Tonks 
(1992) Annu. Rev. Cell Biol. 8:463-493). 

An additional family of protein kinases previously thought to exist only in procaryotes is the 
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histidine protein kinase family (HPK). HPKs bear little homology with mammalian STKs or PTKs 
but have distinctive sequence motifis of their own (Davie, J.R. et al. (1993) J. Biol. Chem. 
270:19861-19867). A histidine residue in the N-terminal half of the molecule (region I) is an 
autophosphorylation site. Hiree additional motifs located in the C-tenninal half of the molecule 
include an invariant asparagine residue in region n and two glydne-rich loops characteristic of 
nucleotide binding domains in regions III and IV. Recently a branched chain alpha-ketoacid 
dehydrogenase kinase has been found with characteristics of HPK in rat (Davie, supra) . 

Protein phosphatases regulate the effects of protein kinases by removing phosphate groups 
from molecules previously activated by kinases. Hie two principal categories of protein phosphatases 
are the protein (serine/threonine) phosphatases (PPs) and the protein tyrosine phosphatases (PTPs). 
PPs dephosphorylate phosphoserine/threonine residues and are important regulators of many 
cAMP-mediated hormone responses (Cohen, P. (1989) Annu. Rev. Biochem. 58:453-508). PTPs 
reverse the effects of protein tyrosine kinases and play a significant role in cell cycle and cell 
signaling processes (Chaibonneau, supra). As previously noted, many PTKs are encoded by 
oncogenes, and oncogenesis is often accompanied by increased tyrosine phosphorylation activity. It 
is theiefore possible that PTPs may prevent or reverse cell transformation and the growth of various 
cancers by controlling the levels of tyrosine phosphorylation in cells. This hypothesis is supported by 
studies showing that overexpression of PTPs can suppress transformation in cells» and that specific 
mhibition of PlPs can enhance cell transformation (Charbonneau, swtblY 
Phospholipid and Inositol-Phosphate Signaling 

Inositol phospholipids (phosphoinositides) are involved in an intracellular signaling pathway 
that begins with binding of a signaling molecule to a G-protein linked recq}tor in the plasma 
membrane. This leads to the phosphorylation of phosphatidylinositol (PI) residues on the inner side 
of the plasma membrane to the biphosphate state (PIPj) by inositol kinases. Simultaneously, the G- 
protein linked receptor binding stimulates a trimeric G-protein which in turn activates a 
phosphoinositide-spedfic phospholipase C-p. Phospholipase C-p then cleaves PIPj into two 
products, inositol triphosphate (IP3) and diacylglycerol. These two products act as mediators for 
sq)arate signaling events. IP3 diffuses through the plasma membrane to induce calcium release from 
the endoplasmic reticulum (£R), while diacylglycerol remains in the membrane and helps activate 
protein kinase C, an STK that phosphorylates selected proteins in the target cell. The calcium 
response initiated by IP3 is terminated by the dcphosphorylation of IP3 by specific inositol 
phosphatases. Cellular responses that are mediated by this pathway are glycogen breakdown in the 
liver in response to vasopressin, smooth muscle contraction in response to acetylcholine, and 
thrombin-induced platelet aggregation. 
Cvclic Nucleotide Signaling 
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Cyclic nucleotides (cAMP and cGMP) function as intracdlular second messengers to 
transduce a variety of extracellular signals including hormones, light, and neurotransmitters. In 
particular, cyclic- AMP dependent protein kinases (PKA) are thought to account for all of the effects 
of cAMP in most mammalian cells, including various hormone-induced cellular responses. Visual 
5 excitation and the pholotransmission of light signals in the eye is controlled by cyclic-GMP regulated, 
Ca^*-specific channels. Because of the importance of ceUular levels of cyclic nucleotides in 
mediating these various responses, regulating the synthesis and breakdown of cyclic nucleotides is an 
important matter. Thus adenylyl cyclase, which synthesizes cAMP from AMP, is activated to 
increase cAMP levels in muscle by binding of adrenaline to P-andrenergic recqDtors, while activation 

10 of guanylate cyclase and increased cGMP levels in photoreceptors leads to reopening of the 
Ca^^-specific channels and recovery of tiie dark state in the eye. In contrast, hydrolysis of cyclic 
nucleotides by cAMP and cGMP-specific phosphodiesterases (PD£s) produces the opposite of these 
and otiier effects mediated by increased cyclic nucleotide levels. PDEs appear to be particularly 
important in die regulation of cyclic nucleotides, considering the diversity found in this family of 

15 proteins. At least seven families of mammalian PDEs (PDEl-7) have been identified based on 
substrate specificity and affinity, sensitivity to cofactors, and sensitivity to inhibitory drugs (Beavo, 
J.A. (1995) Physiological Reviews 75:725-48). PDE inhibitors have been found to be particularly 
useful in treating various clinical disorders. Rolipram, a specific inhibitor of PDE4, has been used in 
die treatment of dq)ression, and similar inhibitors are undergoing evaluation as anti-infiammatoiy 

20 agents. Theophylline is a nonspecific PDE inhibitor used in the treatment of bronchial asthma and 
oiha: respiratory diseases (Banner, K.H. and CP. Page (1995) Eur. Respir. J. 8:996-1000). 
G-Protein Signaling 

Guanine nucleotide binding proteins (G-proteins) are aitical mediators of signal transduction 
between a particular class of extracellular recq>tors, the G-protein coupled receptors (GPCR), and 

2 5 intracellular second messengers such as cAMP and Ca^*. G-proteins are linked to the cytosolic side 
of a GPCR such that activation of the GPCR by ligand binding stimulates binding of the G-protein to 
GTP, inducing an "active" state in the G-protein. In the active state, the G-protein acts as a signal to 
trigger other events in the cell such as the increase of c AMP levels or the release of Ca^* into the 
cytosol from the ER, which, in turn, regulate phosphorylation and activation of other intracellular 

3 0 proteins. Recycling of the G-protein to die inactive state involves hydrolysis of die bound GTP to 
GDP by a GTPase activity in the G-protein. (See Alberts, B. et al. (1994) Molecular Biology of the 
Cell . Garland Publishing, Inc., New York NY, pp.734-759.) Two structurally distinct classes of G- 
proteins are recognized: heterotrimeric G-proteins, consisting of diree diffa-ent subunits, and 
monomoic, low molecular weight (LMW), G-protdns consisting of a single polypq)tide chain. 

3 5 The three polypeptide subunits of heterotrimeric G-protdns are die a, p, and y subunits. Tlie 
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a subunit binds and hydrolyzes GTP. The P and y subunits form a tight complex that anchors the 
protein to the huier side of the plasma membrane. The p subunits, also known as G-p proteins or P 
transdudns, contain seven tandon rqieats of the WD-iq)eat sequence motif, a motif found in many 
proteins with regulatory functions. Mutations and variant expression of P transdudn proteins are 
5 linked with various disorders (Neer, EJ. et al. (1994) Nature 371:297-300; Margottin, F. et al. (1998) 
Mol. Cdl 1:565-574). 

LMW GTP-protdns are GTPases which regulate cdl growth, cdl cycle control, prolan 
secr^on, and intracdlular vesicle interaction, lliey consist of single polypeptides which, like the a 
subunit of the heterotrimeric Gi)rotdns, are able to bind and hydrolyze GTP, thus cycling betweoi an 

10 inactive and an active state. At least sixty members of the LMW G-protein superfamily have been 
identified and are cunrently grouped into the six subfamilies of ras. iho, arf, sari, ran, and rab. 
Activated ras genes were initially found in human canc^s, and subsequent studies conflnueu that ras 
function is critical in determining whether cells continue to grow or become diffa-entiated. Other 
members of the LMW G-protein superfamily have roles in signal transduction that vary with the 

1 5 function of the activated genes and the locations of the G-proteins. 

Guanine nucleotide exchange factors regulate the activities of LMW G-proteins by 
detOTnining wheth^ GTP or GDP is bound. GTPase-activating protein (GAP) binds to GTP-ras and 
induces it to hydrolyze GTP to GDP. In contrast, guanine nucleotide releasing protein (GNRP) binds 
to GDP-ras and induces the release of GDP and the binding of GTP. 

2 0 Other regulators of G-proiein signaling (RGS) also exist that act primarily by negatively 

regulating the G-protein pathway by an unknown mechanism (Druey, K.M, et al. (1996) Nature 
379:742-746). Some 15 members of the RGS family have been identified. RGS family members are 
rdated structurally through similarities in an approximately 120 amino acid region termed the RGS 
domain and functionally by their ability to inhibit the interleukin (cytokine) induction of MAP kinase 
25 in cultured manunalian 293T cells (I^ey, supra) . 
Calcium Signaling Molecules 

Ca*^ is another second messenger molecule that is even more widely used as an intracellular 
mediator than cAMP. Two pathways exist by which Ca*^^ can enter the cytosol in response to 
extracellular signals: One pathway acts primarily in nerve signal transduction where Ca^^ enters a 

3 0 nerve terminal through a voltage-gated Ca^^ channel. Hie second is a more ubiquitous pathway in 

which Ca^^ is released fiiom the ER into the cytosol in response to binding of an extracellular 
signaling molecule to a receptor. Ca^"^ directly activates regulatory enzymes, such as protdn kinase C, 
which trigger signal transduction pathways. Ca^* also binds to specific Ca^*-binding protdns (CBPs) 
such as calmodulin (CaM) \^ch then activate multiple targd protdns in the cdl including enzymes, 
3 5 mmbrane transport pumps, and ion channds. CaM interactions are involved in a multitude of cellular 
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pnKesses including, but not limited to, g^e regulation, DNA synthesis, cell cycle progression, 
mitosis, cytokinesis, cytoskeletal organization, muscle contraction, signal transduction, ion 
homeostasis, exocytosis, and metabolic regulation (Cello. M.R. et al. (1996) Guidebook to 
Calcium-binding Proteins . Oxford University Press. Oxford, UK. pp. 15-20). Some CBPs can serve 
5 as a storage 6epol for Ca^* in an inactive state. Calsequestrin is one such CBP that is expressed in 
isoforms specific to cardiac muscle and skeletal muscle. It is suggested that calsequestrin binds Ca^* 
in a rapidly exchangeable sute that is released during Ca^"^ -signaling conditions (Cdio, M.R. et al. 
(1996) Guidebook to Calcium-binding Proteins , Oxford University Press, New York NY, pp. 222- 
224). 
10 Cvclins 

Cell division is the fundamental process by which all living things grow and reproduce. In 
most organisms, the cell cycle consists of three principle steps; interphase, mitosis, and cytokinesis. 
Interphase, involves preparations for cell division, replication of the DNA and production of essential 
proteins. In mitosis, the nuclear mat^al is divided and separates to opposite sides of the cell. 

15 Cytokinesis is the final division and fission of the cell cytoplasm to produce the daughter cells. 

Hie entry and exit of a cell from mitosis is regulated by the synthesis and destruction of a 
family of activating proteins called cyclins. Cyclins act by binding to and activating a group of 
cyclin-dq)endent protein kinases (Cdks) which then phosphorylate and activate selected proteins 
involved in the mitotic process. Sev^ types of cyclins exist (Ciechanover, A. (1994) Cell 

20 79: 1 3-21 .) Two principle types are mitotic cyclin, or cyclin B, which controls entry of the cell into 
mitosis, and Gl cyclin, which controls events that drive the cell out of mitosis. 
Signal Complex Scaffolding Proteinis 

Ceretain proteins in intracdlular signaling pathways serve to link or cluster other proteins 
involved in the signaling cascade. A conserved protein domain called the PDZ domain has been 

2 5 id^tified in various membrane-associated signaling proteins. This domain has been implicated in 
receptor and ion channel clustering and in the targeting of multiprotdn signaling complexes to 
specialized functional regions of the cytosolic face of the plasma membrane. (For a review of PDZ 
domain-containing proteins, see Pouting, CP. et al. (1997) Bioessays 19:469-479.) A large 
proportion of PDZ domains are found in the eukaryotic MAGUK (membrane-associated guanylate 

30 kinase) protein family, members of which bind to the intracellular domains of receptors and channels. 
However, PDZ domains are also found in diverse membrane-localized proteins such as protein 
tyrosine phosphatases, serine/threonine kinases, G-protein cofactors, and synapse-associated proteins 
such as syntrophins and neuronal nitric oxide synthase (nNOS). Generally, about one to three PDZ 
domains are found in a given protein, although up to nine PDZ domains have been identified in a 

35 single protein. 
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Membrane Transport Molecules 

SEQ ID N0:13 encodes, for example, a membrane transport molecule. 
Hie plasma membrane acts as a barrier to most molecules. Transport between the cytoplasm 
and the extracellular environment, and between the cytoplasm and lumenal spaces of cellular 
5 organelles requires specific transpoit proteins. Each transport protein carries a particular class of 
molecule, such as ions, sugars, or amino adds, and often is specific to a certain molecular species of 
the class. A variety of human inherited diseases are caused by a mutation in a transport protein. For 
example, cystinuria'is an inherited disease that results from the inability to transport cystine, the 
disulfide-linked dimer of cysteine, from the urine into the blood. Accumulation of cystine in the 
1 0 urine leads to the formation of cystine stones in the kidneys. 

Transport proteins are multi-pass transmembrane proteins, which either actively transport 
molecules across the membrane or passively allow them to cross. Active transport involves 
directional pumping of a solute across the membrane, usually against an electrochemical gradient 
Active transport is tightly coupled to a source of metabolic energy, such as ATP hydrolysis or an 
1 5 electrochemically favorable ion gradient Passive transport involves the movement of a solute down 
its electrochemical gradient. Transport proteins can be further classified as either carrier proteins or 
channel proteins. Carrier protdns, which can function in active or passive transport, bind to a specific 
. solute to be transported and undergo a conformational change which transfers the bound solute across 
the membrane. Channel proteins, which only function in passive transport, form hydrophilic pores 
20 across the mmbrane. When the pores open, specific solutes, such as inorganic ions, pass through the 
membrane and down the electrochemical gradient of the solute. 

Carrier protdns which transport a single solute from one side of the membrane to the other 
are called uniporters. In contrast, coupled transporters link the transfer of one solute with 
simultaneous or sequential transfo* of a second solute, either in the same direction (symport) or in the 

2 5 opposite direction (antiport). For example, intestinal and kidney qpithelium contains a variety of 

symporter systems driven by the sodium gradient that exists across the plasma membrane. Sodium 
moves into the cell down its electrochemical gradient and brings the solute into the cell with it. The 
sodium gradient that provides the driving force for solute uptake is maintained by the ubiquitous 
Na*/K* ATPase. Sodium-coupled transporters include the mammalian glucose transporter (SGLTl), 

3 0 iodide transporter (NIS), and multivitamin transporter (SMVT). All three transporters have twelve 

putative transmembrane segments, extracellular glycosylation sites, and cytoplasmlcally-oriented N- 
and C-termini. NIS plays a crucial role in the evaluation, diagnosis, and treatment of various thyroid 
pathologies because it is the molecular basis for radioiodide thyroid-imaging techniques and for 
specific targeting of radioisotopes to the thyroid gland (Levy, O. ct al. (1997) Proc. Natl. Acad. Sci. 
35 USA 94:5568-5573). SMVT is expressed in the intestinal mucosa, kidney, and placenta, and is 
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inq)licated in the transport of the water-soluble vitamins, e.g., biotin and pantothenate (Prasad, P.D. et 
al. (1998) J. Biol. Chem. 273:7501-7506). 

Transporters play a major role in the regulation of pH, exaction of drugs, and the cellular 
K^/Na* balance. Monocarboxylate anion transporters are proton-coupled synqxffters with a broad 
substrate specificity that includes L-lactate, pyruvate, and the ketone bodies acetate, acetoacetate, and 
beta-hydroxybutyrate. At least seven isoforms have been identified to date. The isoforms are predicted 
to have twdve transmembrane (TM) helical domains with a large intracellular loop between TM6 and 
TM7, and play a critical role in maintaining intracdlular pH by r^oving the protons that are produced 
stoichiometrically with lactate during glycolysis. The best characterized H(+)-monocarboxylate 
transporto- is that of the erythrocyte monbrane, which transports L-lactate and a wide range of other 
aliphatic monocarboxylates. Otho- cells possess H(+)-linked monocarboxylate transporters with 
differing substrate and inhibitor sdectivities. In particular, cardiac muscle and Uimor cells have 
transporters that differ in thdr values for certain substrates, including stereosdectivity for L- over 
D-lactate, and in thdr sensitivity to inhibitors. There are Na(+)-monocarboxylate cotransporters on the 
luminal surface of intestinal and kidney qDithelia, which allow the uptake of lactate, pyruvate, and 
ketone bodies in these tissues. In addition, there are spedfic and sdective transporters for organic 
cations and organic anions in organs including the kidney, intestine and liver. Organic anion 
transpcMters are sdective for hydrophobic, charged molecules with dectron-attracting side groups. 
Organic cation transporters, such as the anunonium transporter, mediate the secretion of a variety of 
drugs and ^idog^us metabolites, and contribute to the maintenance of intercdlular pH. (Poole, R.C. 
and A.P. Halestrap (1993) Am. J. Physiol. 264:C761-C782; Price, N.T. et al. (1998) Biochem. J. 
329:321-328; and Martindle* K. and I. Haggstrom (1993) J, Biotechnol. 30: 339-350.) 

The largest and most diverse family of transport protdns known is the ATP-binding cassette 
(ABC) transpcvters. As a family, ABC transporters can transport substances that differ markedly in 
chemical structure and size, ranging fi-om small molecules such as ions, sugars, amino adds, pq)tides, 
and phospholipids, to lipppq}tides, large proteins, and conq}lex hydrophobic drugs. ABC proteins 
consist of four modules: two nudeotide-binding domains (NBD), which hydrolyze ATP to supply the 
GDsrgy required for transport, and two membrane-spanning domains (MSD), each containing six 
putative transmembrane segments. These four noodules may be encoded by a single gene, as is the case 
for the cystic fibrosis transmembrane regulator (CFTR), or by separate genes. When encoded by 
separate genes, each gene product contains a single NBD and MSD. Hiese "half-molecules" form 
homo- and heterodimers, such as Tapl and Tap2, the endoplasmic reticulum-based major 
histocompatibility (MHC) pq)tide transpcnt system. Several genetic diseases are attributed to defects in 
ABC transporters, sudi as the following diseases and their corresponding preens: cystic fibrosis 
(CFTR, an ion channd), adrenoloikodystrophy (adrenolaikodystrophy protdn. ALDP). Zdlweger 
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syndrome (peroxisomal membrane protein-70. PMP70), and hyperinsulinemic hypoglycemia 
(sulfonylurea recq)tor, SUR). Overexpression of the multidrug resistance (MDR) protein, another 
ABC transporter, in human cancer cdls makes the cells resistant to a variety of cytotoxic drugs used in 
chanotherapy (Taglight. D. and S. Michaelis (1998) Meth, Enzymol. 292:131-163), 

Transport of fatty adds across the plasma membrane can occur by diffusion, a high capacity, 
low affinity process. However, under ncroial physiological conditions a significant fraction of fatty 
add transport appears to occur via a high afOnity, low capadty protdn-mediated transport process. 
Fatty add transport protdn (FATP), an integral monbrane protein with four transmanbrane segments, 
is expressed in tissues exhibiting high levds of plasma membrane fatty add flux, such as muscle, heart, 
and adipose. Expression of FATP is upregulated in 3T3-L1 cdls during adipose convffsion, and 
expression in C0S7 fibroblasts devates uptake of long-chain fatty acids (Hui, T. Y. et al. (1998) J. 
Biol. Chem. 273:27420-27429). 
Ion Channds 

The dectrical potential of a cell is generated and maintained by controlling the movement of 
ions aaoss the plasma membrane. The movonent of ions requires ion channds, which form an ion- 
sdective pore within the membrane. Hiere are two basic types of ion channds, ion transporters and 
gated ion channds. Ion transporters utilize the energy obtained from ATP hydrolysis to activdy 
transport an ion against the ion's concentration gradient. Gated ion channds allow passive flow of an 
ion down the ion's dectrochemical gradient under restricted conditions. Together, these types of ion 
channds generate, maintain, and utilize an dectrochemical gradi^t that is used in 1) dectrical impulse 
conduction down the axon of a nerve cdl, 2) transport of molecules into cdls against conc^ation 
gradiems, 3) initiation of muscle contraction, and 4) endocrine cdl secretioa 

Ion transporters generate and maintain the resting dectrical potential of a cell. Utilizing the 
oiergy derived from ATP hydrolysis, th^ transport ions against the ion's conc^tration gradioH 
These transmembrane ATPases are divided into three families. The phosphorylated (P) class ion 
transporters, including Na*-K* ATPase, Ca^*-ATPase, and H*-ATPase» are activated by a 
phosphorylation event. P-class ion transporters are responsible for maintaining resting potential 
distributions such that cytosolic concentrations of Na^ and Ca^^ are low and cytosolic concentration of 

is high. The vacuolar (V) class of ion transporters includes pumps on intracdlular organdies, 
such as lysosomes and Golgi. V-class ion transporters are responsible for generating the low pH within 
the lumen of these organelles that is required for function. The coupling factor (F) class consists of 
pumps in the mitochondria. F-class ion transporters utilize a proton gradient to generate ATP from 
ADP and inorganic phosphate (Pj). 

The resting potential of the cdl is utilized in many processes involving carrier protdns and 
gated ion channds. Carrier protdns utilize the resting potential to transport molecules into and out of 
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the cell. Amino add and glucose transport into many cdls is linked to sodium ion co-transport 
(symport) so that the movement of Na*^ down an dectroch^cal gradient drives transpot of the oth^ 
molecule up a concentration gradient Similarly, cardiac muscle links transfer of Ca out of the cell 
with tt'anspcHt of Na^ into the cell (antiport). 
5 Ion channels share common structural and mechanistic themes. The channd consists of four or 

five subunits or protdn monomers that are arranged like a barrel in the plasma membrane* Each 
subunit typically consists of six potoitial transmembrane segmrats (SI, S2. S3, S4, S5. and S6). The 
colter of the barrel forms a pore lined by a-hdices or p-strands. The side chains of the amino acid 
residues conQ)rising the a-hdices or P-strands establish the charge (cation or anion) sdectivity of the 

10 channd. The degree of sdectivity, or what specific ions are allowed to pass through the channd, 
depends on the diameter of the narrowest part of the pore. 

Gated ion channels control ion flow by regulating the opening and closing of pores. These 
channds are categaized according to the manner of regulating the gating function. Mechanically-gated 
channds open pores in response to mechanical stress, voltage-gated channds open pores in response to 

1 5 changes in membrane potential, and ligand-gated channds open pores in the presence of a specific ion, 
nucleotide, or neurotransmitter. 

Voltage-gated Na* and channels are necessary for the function of dectrically excitable cdls, 
such as nerve and muscle cells. Action potentials, which lead to neurotransmitter rdease and muscle 
comraction, arise firom large, transient changes in the permeability of the manbrane to Na* and K* ions. 

2 0 Depolarization of the membrane beyond the threshold levd opens voltage-gated Na^ channds. Sodium 
ions flow into the cell, further dqwlarizing the membrane and opaiing more voltage-gated Na* 
channds, which propagates the dq)olarization down the length of the cdl. Depolarization also iypcns 
voltage-gated potassium channds. Consequ^tly, potassium ions flow outward, which leads to 
rq}Qlarization of the mranbrane. Voltage-gated diannels utilize diarged residues in the fourth 

2 5 transm^rane segment (S4) to sense voltage diange. The open state lasts only about 1 millisecond, at 

which time the channd spontaneously converts into an inactive state that cannot be apeaed irrespective 
of the manbrane potential. Inactivation is mediated by the channd's N-terminus, which acts as a plug 
that closes the pore. The transition from an inactive to a closed state requires a return to resting 
potential. 

3 0 Voltage^gated Na*^ channds are heterotrimeric complexes composed of a 260 kDa pore forming 

a subunit that assodates with two smaller auxiliary subunits, pi and P2. The P2 subunit is an int^al 
membrane glycoprotdn that contains an extracdlular Ig domain, and its assodation with a and pi 
subunits corrdates with increased functional expression of the channd, a change in its gating 
properties, and an increase in whole cdl capadtance due to an inaease in membrane surface area. 
35 (Isom, L.L. et al. (1995) Cdl 83:433-442.) 
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Voltage-gated Ca^* diaimds are involved in presynaptic n»irotransniitter release* and heart 
and skeletal nmsde contraction. The voltage-gated Ca ^* channels from skeletal muscle (L-type) and 
brain (N-type) have been purified, and though their functions differ dramatically, they have similar 
subunit conqx)sitions. The channels are conq)osed of three subunits. The a, subunit forms the 
membrane pore and voltage sensor, while the and p subunits modulate the voltage-dependence, 
gating properties, and the current amplitude of the channel. These subunits are encoded by at least six 
Q], one 028, and four p geoes. A fourth subunit, y, has bera identiQed in skeletal muscle. (Walker, D. 
et al. (1998) J. Biol. Choa 273:2361-2367; and Jay. S.D. d al. (1990) Science 248:490-492.) 

Chloride channels are necessary in endocrine secretion and in regulation of cytosolic and 
organelle pH. In secretory epithelial cells, CI ' enters the cell across a basolateral membrane through an 
Na*, KVCl ' cotransporter, accumulating in the cdl above its electrochemical equih'brium concentratioa 
Secretion of CI " from the apical surface, in response to hormonal stimulation, leads to flow of Na* and 
water into the secr^ory lumen. The cystic fibrosis transmembrane conductance regulator (CFTR) is a 
chloride charmd wicoded by the gene for cystic fibrosis, a common fatal genetic disorder in humans. 
Loss of CFTR function deaeases transq)ithelial water seaetion and, as a result, the layo's of mucus 
that coat the respiratory tree, pancreatic ducts, and intestine are dehydrated and difficult to clear. The 
resulting blockage of these sites leads to pancreatic insufficiency, "meconium ileus", and devastating 
"chronic obstructive puhnonary disease" (Al-Awqati, Q. et al. (1992) J. Exp. Biol. 172:245-266). 

Many intracellular organelles contain H'^-ATPase pumps that generate transmembrane pH and 
dectrochemical differences by moving protons from the cytosol to the organelle lumea If the 
membrane of the organelle is permeable to other ions, then the electrochemical gradient can be 
abrogated without affecting the pH differential. In fact, removal of the electrochemical barria* allows 
more H * to be pumped across the membrane, increasing the pH differential. CI " is the sole counterion 
of H* U'anslocationin.a number of organelles, including chromaffin granules, Golgi vesicles, 
lysosomes, and endosomes. Functions that require a low vacuolar pH include uptake of small 
molecules such as biogenic amines in chromaffin granules, processing of vacuolar constituents such as 
pro-hormones by proteolytic enzymes, and protdn degradation in lysosomes (Al-Awqati, supra) . 

Ligand-gated diannds open their pores wh^ an extracdlular or imracdlular mediator binds to 
the channel. Noirotransmitter-gated channds are channds that open when a neurotransmitter binds to 
their extracdiular domain. These channels exist in the postsynaptic membrane of nerve or muscle cdls. 
There are two types of neurouransmitter-gated channds. Sodium channds open in response to 
exdtatory n»u-otransmitters, such as acaylcholine, glutamate, and serotonin. Tliis opening causes an 
influx of Na* and produces the initial localized depolarization that activates the voltage-gated channds 
and starts the action potential. Chloride channds open in response to inhibitCTy neurotransmitters, such 
as Y-aminobutyric add (GABA) and glydne, leading to hyperpolarization of the mmbrane and the 
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subsequrat generation of an action potential. 

Ugand-gated cbannds can be r^ated by intracellular second mess^ers. Calcium-activated 
K'^cbanneis are gated by internal calcium ions. Innervecdls, an influx of calcium during 
dq)olarization opens channels to modulate the magnitude of the action potential (Ishi. T.M. et al. 
5 (1997) Proc. Natl. Acad Sd. USA 94:1 165M 1656). Cyclic nudeotide-gated (CNG) channels are 
gated by cytosolic cyclic nucleotides. The best examples of these are the cAMP-gated Na ^ channels 
involved in olfaction and the cGMP-gated cation channels involved in vision. Both systems involve 
ligand-mediated activation of a G-protein coupled recq)tor which then alters the levd of cyclic 
nucleotide within the cell. 

10 Ion channels are expressed in a number of tissues where they are implicated in a variety of 

processes. CNG channds, while abundantly expressed in photorecqjtor and olfactory sensory cells, are 
also found in kidney, lung, pineal, retinal ganglion cells, testis, aorta, and brain. Calcium-aetivated K* 
channds may be responsible for the vasodilatory effects of bradykinin in the kidney and for shunting 
excess K* from brain'capillary endothdial cdls into the blood. They are also implicated in repolarizing 

15 granulocytes after agonist-stimulated dqwlarization (Ishi, suEra). Ion channds have been the target for 
many drug therapies. Neurotransmitter-gated channds have been targeted in tho-apies for treatment of 
insomnia, anxiety, dq)ression, and schizophrenia. Voltage-gated channds have been targeted in 
therapies for arrhythmia, ischanic stroke, head trauma, and neurodeg«ierative disease (Taylor, CP. 
and L.S. Narasimhan (1997) Adv. Pharmacol. 39:47-98). 

20 Disease Correlation 

The etiology of numerous human diseases and disorders can be attributed to defects in the 
transport of molecules across membranes. Ddfects in the trafGcking of monbrane-bound transporters 
and ion channeils are assodated with several disorders, e.g. cystic fibrosis, glucose-galactose 
malabsorption syndrome, hypercholesterolemia, von Gierke disease, and certain forms of diabetes 

2 5 mdlitus. Single-gene defect diseases resulting in an inability to transport small molecules across 

membranes include, e.g., cystinuria, iminoglydnuria, Hartup disease, and Fanconi disease (vant Hoff, 
W.G. (1996) Exp. Nephrol. 4:253-262; Talaite, G.M. al. (1994) Ann. Intern. Med. 120:218-226; 
and Chillon, M. et al. (1995) New Engl. J. Med. 332:1475-1480). 

3 0 Protdn Modification and Maintenance Molecules 

The cellular processes regulating modification and maintenance of protein molecules 
coordinate their conformation, stabilization, and degradation. Each of these processes is mediated by 
key enzymes or proteins such as proteases, protease inhibitors, transferases, isomerases, and 
molecular ch^)ax)nes. 
35 Proteases 
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Proteases cleave proteins and pq)tides at the pq)tide bond that fonns the backbone of the 
p^tide and protdn chaia Proteolytic processing is essential to cell growth, differentiation, 
remodeling, and homeostasis as well as inflammation and immune response. Typical protein half- 
lives range from hours to a few days, so that within all living cells, precursor proteins are being 
5 cleaved to their active form, signal sequences proteolytically removed from targeted proteins, and 
aged or defective proteins degraded by proteolysis. Proteases function in bacterial, parasitic, and viral 
invasion and replication within a host Four principal categories of mammalian proteases have been 
identified based on active site structure, mechanism of action, and overall three-dimensional structure. 
(Beynon, RJ. and J.S. Bond (1994) Protcolvtic Enzvmes: A Practical Approach. Oxford University 

10 Press, New York NY. pp. 1 -5). 

The serine proteases (SPs) have a serine residue, usually within a conserved sequence, in an 
active site composed of the serine, an aspartate, and a histidine residue. SPs include the digestive 
enzymes trypsin and chymotrypsin, components of the complement cascade and the blood-clotting 
cascade, and enzymes that control extracellular protein degradation. The main SP sub-families are 

1 5 trypases, which cleave after arginine or lysine; aspartases, which cleave after aspartate; chymases, 
which cleave after phenylalanine or leucine; metases, which cleavage after methionine; and s^ases 
which cleave after s^ne. Enterokinase, the inidator of intestinal digestion, is a serine protease found 
in the intestinal brush Ixmlo:, v/hm it cleaves the acidic propeptide from trypsinogen to yield active 
trypsin (Kitamoto, Y? et al. (1994) Proc. Natl. Acad. Sci. USA 91 :7588-7592). 

2 0 Prolylcaiboxypeptidase, a lysosomal serine peptidase that cleaves peptides such as angiotensin II and 
m and [de5-Arg9] bradykinin, shares sequence homology with members of both the serine 
carboxypeptidase and prolylendopq^tidase families (Tan, F. et al. (1993) J. Biol. Chem. 268:16631- 
16638). 

Cysteine proteases (CPs) have a cysteine as the major catalytic residue at an active site wh^e 

2 5 catalysis proceeds via an intermediate thiol ester and is facilitated by adjacent histidine and aspartic 

acid residues. CPs are involved in diverse cellular processes ranging from the processing of precursor 
proteins to intracellular degradation. Mammalian CPs include lysosomal cathepsins and cytosolic 
calcium activated proteases, calpains. CPs are produced by monocytes, macrophages and other cells 
of the immune system which migrate to sites of inflammation and seaete molecules involved in 
30 tissue repair. Overabundance of these repair molecules plays a role in certain disorders. In 

autoimmune diseases such as rheumatoid arthritis, secretion of the cysteine peptidase cathepsin C 
degrades collagen, laminin, elastin and other structural proteins found in the extracellular matrix of 
bones. 

Aspartic proteases are members of the cathq)sin family of lysosomal proteases and include 

3 5 pq3sin A, gastricsin, chymosin, renin, and cathq)sins D and E. Aspartic proteases have a pair of 
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aspartic add residues in the active site, and are most active in the pH 2 * 3 range, in which one of the 
aspartate residues is ionized, the other un-ionized. Aspartic proteases include bacterial 
penicillopq)sin, mammalian pq)sin, renin, chymosin, and certain Amgal proteases. Abnormal 
regulation and expression of cathq)sins is evident in various inflammatory disease states. In cells 
5 isolated from inflamed synovia, the mRNA for stromdysin, cytokines, TIMP-1, cathq)sin, gdatinase, 
and other molecules is prefaentially expressed Expression of cathq)sins L and D is elevated in 
synovial tissues fron^ patients with rheumatoid arthritis and osteoarthritis. Cathq)sin L expression may 
also contribute to the influx of mononuclear cdls which exacerbates the destruction of the rheumatoid 
synovium. (Keyszer, G.M. (1995) Arthritis Rheum. 38:976-984.) The increased expression and 
10 differential regulation of the cathq)sins are linked to the metastatic potential of a variety of cancers and 
as such are of therapeutic and prognostic interest (Chambers, A.F. et al. (1993) Crit Rev. Oncog. 
4:95-114). 

Metalloproteases have active sites that include two glutamic acid residues and one histidine 
residue that serve as binding sites for zinc. Carboxypeptidases A and B are the principal mammalian 

1 5 metalloproteases. Both are exoproteases of similar structure and active sites. Carboxypeptidase A, 
like chymotrypsin, prefers C-terminal aromatic and aliphatic side chains of hydrophobic nature, 
whereas carboxypeptidase B is directed toward basic arginine and lysine residues. Glycoprotease 
(GCP), or 0-sialoglycoprotein endopeptidase, is a metallopeptidase which specifically cleaves 
O-sialoglycoproteins such as glycophoxin A. Another metallopq)tidase, placental leucine 

20 aniinopq)tidase (P-LAP) degrades several peptide hormones such as oxytocin and vasopressin, 
suggesting a role in maintaining homeostasis during pregnancy, and is expressed in several tissues 
(Rogi, T. et al. (1996) J. Biol. Chem. 271 :56-61). 

Ubiquitin prbteases are associated with the ubiquitin conjugation system (UCS), a major 
pathway for the degradation of cdlular proteins in eukaryotic cells and some bacteria. Hie UCS 

2 5 mediates the dimination of abnormal proteins and regulates the half-lives of important regulatory 

proteins that control cellular processes such as gene transcription and cdl cycle progression. In the 
UCS pathway, protdns targeted for degradation are conjugated to a ubiquitin, a small heat stable 
protein. The ubiquitinated protdn is then recognized and degraded by proteasome, a large, 
multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin 
30 protease. The UCS is implicated in the degradation of mitotic cyclic kinases, oncoproteins, tumor 
suppressor genes such as p53, viral protdns, cell surface receptors associated with signal 
transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) 
Cell 79:13-21). A murine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whose 
ova-expression leads to oncogenic transformation of NIH3T3 cells, and the human homolog of this 

3 5 gene is consistentiy elevated in small cell tumors and adenocarcinomas of the lung (Gray, D. A. 
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(1995) Oncogene 10:2179-2183). 
Signal Peptidases 

The mechanism for the translocation process into the endoplasmic reticulum (ER) involves 
the recognition of an N-tenninal signal peptide on the elongating protein. The signal peptide directs 
the protein and attached ribosome to a receptor on the ER membrane. The polypeptide chain passes 
through a pore in the ER membrane into the lumen while the N-tominal signal peptide remains 
attached at the membrane surface. The process is completed when signal peptidase located inside the 
ER cleaves the signal pq>tide from the protein and releases the protein into the lumen. 
Protease hihibitors 

Protease inhibitors and other regulators of protease activity control the activity and effects of 
proteases. Protease inhibitors have been shown to control pathogenesis in animal models of 
proteolytic disorders (Muiphy. G. (1991) Agents Actions Suppl. 35:69-76). Low levels of the 
cystatins, low molecular weight inhibitors of the cysteine proteases, correlate with malignant 
progression of tumors. (Calkins, C. et al (1995) Biol. Biochem. Hoppe Seyler 376:71-80). Sopins 
are inhibitors of mammalian plasma sennc proteases. Many seipins serve to regulate the blood 
clotting cascade and/or the complement cascade in mammals. Sp32 is a positive regulator of the 
mammalian aaosomal protease, acrosin, that binds the proenzyme, proaaosin, and thereby aides in 
packaging the enzyme into the acrosomal matrix (Baba, T. et aL (1994) J. Biol. Chem. 269:10133- 
10140). The Kunitz family of serine protease inhibitors are characterized by one or more "Kunitz 
domains" containing a series of cystdne residues that are regularly spaced over approximately 50 
amino acid residues and form three intrachain disulfide bonds. Members of this family include 
aprotinin. tissue factor pathway inhibits (TFPI-1 and TFPI-2), inter-a-trypsin inhibitor, and bikunin. 
(Marlor. C.W. a al. (1997) J. Biol. Chem. 272:12202-12208.) Members of this family are potent 
inhibitcx-s (in the nanomolar range) against serine proteases such as kallikrein and plasmin. Aprotinin 
has clinical utility in reduction of perioperative blood loss. 

A major portion of all proteins synthesized in eukaiyotic cells are synthesized on the 
cytosolic surface of the endoplasmic reticulum (ER). Before these immature proteins are distributed 
to other organelles in the cell or are secreted, they must be transported into the interior lumen of the 
ER where post-translational modifications are performed, lliese modifications include protein folding 
and the formation of disulfide bonds, and N-linked glycosylations. 
Protein Isomerases 

Protein folding in the ER is aided by two principal types of protein isomerases, protein 
disulfide isomerase (PDI), and peptidyl-prolyl isomerase (PPI). PDI catalyzes the oxidation of free 
sulfliydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins. PPI, an 
enzyme that catalyzes the isomerization of certain proline imidic bonds in oligopeptides and proteins, 
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is consido-ed to govern one of the rate limiting steps in the folding of many proteins to their final 
functional confonnation. Hie cydophilins r^resent a major class of PPI that was originally 
id^fied as the major receptor for the immunosuppressive drug cyclosporin A (Handschumadier, 
R.E. et al. (1984) Science 226: 544-547), 
5 Protein GIvcosvlation 

The glycosylation of most soluble seaeted and membrane-bound proteins by 
oligosaccharides linked to asparagine residues in proteins is also performed in the ER. This reaction 
is catalyzed by a membrane-bound enzyme, oligosaccharyl transferase. Although the exact purpose 
of this "N-linked" glycosylation is unknown, the presence of oligosaccharides tends to make a 

10 glycoprotein resistant to protease digestion. In addition, oligosacdiarides attached to cell-surface 
proteins called selectins are known to function in cell-cell adhesion processes (Alberts, B. et al. 
(1994) Molecular Biology of ttie Cell . Garland Publishing Co., New York NY. p.608). "0-linked" 
glycosylation of proteins also occurs in die ER by the addition of N-acetylgalactosamine to die 
hydroxyl group of a serine or threonine residue followed by the sequential addition of other sugar 

15 residues to the first. Hiis process is catalysed by a series of glycosyltransferases each specific for a 
particular donor sugar nucleotide and accq)tor molecule (Lodish, H. et al. (1995) Molecular Cell 
Bioloev . W.H. freeman and Co., New York NY, pp.7(X)-708). In many cases, botii N- and 0-linked 
oligosaccharides q)pear to be required for the secretion of proteins or the movement of plasma 
membrane glycoproteins to the cell surface. 

20 An additional glycosylation mechanism operates in die ER specifically to target lysosomal 

enzymes to lysosomes and prevent their secretion. Lysosomal enzymes in die ER receive an N-linked 
oligosaccharide, like plasma membrane and secreted proteins, but are tiien phosphorylated on one or 
two mannose residues. The phosphc^ylation of mannose residues occurs in two steps, the first step 
being the addition of an N-acetylglucosamine phosphate residue by N-acetylglucosamine 

2 5 phosphotransferase, and die second die removal of the N-acetylglucosamine group by 

phosphodiesterase. The phosphorylated mannose residue then targets the lysosomal enzyme to a 
mannose 6-phosphate receptor which transports it to a lysosome vesicle (Lodish, supra , pp. 708-71 1). 
QiapCTQnes 

Molecular chaperones are proteins that aid in the proper folding of immature proteins and 

3 0 refolding of improperly folded ones, the assembly of protein subunits, and in the transport of 

unfolded proteins across membranes. ChapCTones are also called heat-shock proteins (hsp) because of 
their tendency to be expressed in dramatically increased amounts following brief exposure of cells to 
elevated temperatures. This latter property most likely reflects Uieir need in die refolding of proteins 
dial have become denatured by the high terapaatures. Chaperones may be divided into several 
3 5 classes according to their location, function, and molecular weight, and include hsp60, TCPl , hsp70, 
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hsp40 (also called DnaJ), and hsp90. For example, hsp90 binds to stmid honnone receptors, 
represses transcription in the absence of the ligand, and provides proper folding of the ligand-binding 
domain of the reoqithr in the presrace of the honnone (Burston, S.G. and A.R. Clarke (1995) Essays 
Biochem. 29:125-136). Hsp60 and hsp70 chap^nes aid in the transport and folding of newly 
5 synthesized proteins. Hsp70 acts early in protein folding, binding a newly synthesized protein before 
it leaves the ribosome and transporting the protein to the mitochondria or ER before releasing the 
folded protein. Hsp60, along with hsplO, binds misfolded proteins and gives them the opportunity to 
refold correctly. All ch^rones share an affinity for hydrophobic patches on incompletely folded 
proteins and the ability to hydrolyze ATP. The energy of ATP hydrolysis is used to release the hsp- 
10 bound protein in its properly folded state (Alberts, supra , pp 214, 571-572). 

Nucleic Acid Synthesis and Modification Molecules 

SEQ ID N0:14. SEQ ID N0:15, SEQ ID N0:16, SEQ ID N0:17, SEQ ID N0:18, SEQ ID 
NO: 19, and SEQ ID NO:20 encode, for example, nucleic acid synthesis and modification molecules. 
15 Polymerases 

DN A and RNA replication are critical processes for cdl replication and Ainction. DNA and 
RNA replication are mediated by the enzymes DNA and RNA polymerase, respectively, by a 
•*templating" process m which the nucleotide sequence of a DNA or RNA strand is copied by 
con9)lementary base-pairing into a conq)l&nentary nucleic acid sequence of either DNA or RNA. 
2 0 However, there are iundamratal difTer^ices between the two processes. 

DNA polymerase catalyzes the stqpwise addition of a deoxyribonucleodde to the 3-OH ead of 
a polynudeodde strand (the primer strand) that is paired to a second (template) strand. The new DNA 
strand therefore grows in the 5' to 3' direction (Albats, B. et al. (1994) The Molecular Biology of the 
Cdl, Garland Publishing Inc., New York NY, pp. 25 1 -254). The substrates for the polymerization 

2 5 reaction are the ccHresponding deoxynucleotide triphosphates ^ch must base-pair with the correct 

nucleotide on the tenq)late strand in oc6er to be recognized by the polymerase. Because DNA exists as 
a double-stranded hdix, each of the two strands may serve as a template for the formation of a new 
conq)lementary strand. Each of the two daughter cells of the dividing cdl therefore inherits a new DNA 
double hdix containing one old and one new strand. Thus, DNA is said to be replicated 

3 0 "semiconservatively" by DNA polymerase. In addition to the synthesis of new DNA, DNA polymerase 

is also involved in the rq)air of damaged DNA as discussed bdow under "Ligases.'* 

In contrast to DNA polymerase, RNA polymerase uses a DNA template strand to "transcribe" 
DNA into RNA using ribonucleotide triphosphates as substrates. Like DNA polymerization, RNA 
polymerization proceeds in a 5' to 3' direction by addition of a ribonucleoside monophosphate to the 3*- 
35 OH Old of a growing RNA chain. DNA transcription generates messenger RNAs (mRN A) tiiat carry 
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information for protein synthesis, as wdl as the transfer, ribosomal, and other RN As that have 
structural or catalytic functions. In eukaryotes, three discr^ RN A polymerases synthesize the three 
differ^ types of RNA (Alberts, supra , pp. 367-368). RNA polymerase I makes the large ribosomal 
RNAs, RNA polymerase II makes the mRNAs that will be translated into protdns, and RNA 
5 polymerase III makes a variety of small, stable RNAs, including 5S ribosomal RNA and the transfer 
RNAs (tRNA). In all cases, RNA synthesis is initiated by binding of the RNA polymerase to a 
promoter region on the DNA and synthesis begins at a start site within the promoter. Synthesis is 
completed at a broad^ general stop or termination region in the DNA where both the polymerase and the 
completed RNA chain are rdeased 
10 Ligases 

DNA repair is the process by which accidental base changes, such as those produced by 
oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA are CGrreeted before 
rqDlication or transcription of the DNA can occur. Because of the efficiency of the DNA rqpair 
process, fewer than one in one thousand accidental base changes causes a mutation (Alberts, supra , pp. 

15 245-249). The three steps common to most types of DNA repair are (1 ) excision of the damaged or 
altered base or nucleotide by DNA nucleases, leaving a gap; (2) insertion of the correct nucleotide in 
this gap by DNA polymerase using the complOTioitary strand as the template; and (3) sealing the break 
left between the insated nucleotide(s) and the existing DNA strand by DNA ligase. In the last reaction, 
DNA ligase uses the energy from ATP hydrolysis to activate the 5' end of the brokai phosphodiester 

20 bond before forming the new bond with the 3'-0H of the DNA strand. In Bloom's syndrome, an 
inherited human disease, individuals are partially deficient in DNA ligation and consequently have an 
iuCTeased incidence of cancer (Alberts, supra , p. 247). 
Nucleases 

Nucleases conq)rise both enzymes that hydrolyze DNA (DNase) and RNA (RNase). They 

2 5 serve different purposes in nucldc acid metabolism. Nucleases hydrolyze the phosphodiester bonds 

between adjacent nucleotides dther at internal positions (endonucleases) or at the terminal 3' or 5' 
nucleotide positions (exonucleases). A DNA exonuclease activity in DNA polymerase, for exanq}le, 
serves to remove inq>roperly paired nucleotides attached to the 3*-0H end of the growing DNA strand 
by the polymerase and thereby serves a "proofreading" iiinctioa As mentioned above, DNA 

3 0 endonuclease activity is involved in the excision step of the DNA repm process. 

RNases also serve a vari^y of functions. For example, RNase P is a ribonuclepprotein enzyme 
which cleaves the 5* end of pre-tRNAs as part of their maturation process. RNase H digests the RNA 
strand of an RNA/DNA hybrid. Such hybrids occur in cdls invaded by retroviruses, and RNase H is 
an important enzyme in the r^oviral replication cycle. Pano-eatic RNase seo-eted by the pancreas into 
35 the intestine hydrolyzes RNA present in ingested foods. RNase activity in serum and cdl extracts is 
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devated in a variety of cancers and infectious diseases (Sdiein, C.H. (1997) Nat Biotechnol. 15:529- 
536). Regulation of RNase activity is bdng investigated as a means to control tumor angiogenesis, 
allergic reactions, viral infection and replication, and fungal infections. 
Methvlases 

5 M^ylation of specific nucleotides occurs in both DNA and RNA, and serves different 

functions in the two macromolecules. Methylation of cytosine residues to form 5-methyl cytosine in 
DNA occurs specifically at CG sequences which are base-paired with one another in the DNA double- 
helix. This pattOTi of methylation is passed fi-om generation to generation during DNA rq)lication by 
an enzyme called "maintenance methylase" that acts preferentially on those CG sequences that are base- 

10 paired with a CG sequence that is already methylated. Such methylation appears to distinguish active 
from inactive genes by preventing the binding of regulatory proteins that "turn on" the gene, but permit 
the binding of proteins that inactivate the g^e (Alberts, supra , pp. 448-45 1 ). In RN A meubolisnL 
**tRN A methylase" produces one of several nucleotide modifications in tRNA that affect the 
conformation and base-pairing of the molecule and facilitate the recognition of the appropriate mRNA 

1 5 codons by specific tRNAs. The primary m^ylation pattern is the dimetiiylation of guanine residues to 
form N,N-dimetiiyl guanine. 
Hdicases and Single-Stranded Binding Protdns 

Hdicases are enzymes that destabilize and unwind double helix structures in both DNA and 
RNA. Since DNA replication occurs more or less simultaneously on both strands, the two strands must 

2 0 first sq)arate to generate a rqplication "fork" for DNA polymerase to act on. Two types of rq)lication 
protdns contribute to this process, DNA helicases and single-stranded binding proteins. DNA hdicases 
hydrolyze ATP and use tiie energy of hydrolysis to separate tiie DNA strands. Single-stranded binding 
protdns (SSBs) then bind to tiie exposed DNA strands witiiout covering tiie bases, therd>y temporarily 
stabilizing tiiem for tenq>lating by tiie DNA polymerase (Alberts, sutnra . pp. 255-256). 

2 5 RNA hdicases also alter and regulate RNA conformation and secondary structure. Like the 

DNA hdicases, RNA hdicases utilize energy derived from ATP hydrolysis to destabilize and unwind 
RNA duplexes. The most wdl-characterized and ubiquitous family of RNA hdicases is tiie DEAD-box 
family, so named for tiie conserved B-type ATP-binding motif which is diagnostic of protdns in this 
family. Over 40 DEAD-box hdicases have been identified in organisms as diverse as bacteria, insects, 

3 0 yeast, amphibians, mammals, and plants. DEAD-box heiicases function in diverse processes such as 

translation initiation, splidng, ribosome assembly, and RNA editing, transport, and stability. Some 
DEAD-box hdicases play tissue- and stage-specific roles in spermatogaiesis and embryogenesis. 
Overexpression of tiie DEAD-box 1 protdn (DDXl) may play a role in the progression of 
neuroblastoma (Nb) and retinoblastoma (Rb) tumors (Godbout, R. et al. (1998) J. Biol. Chem. 
35 273:21161-21168). Tliese observations suggest tiiat DDXl may promote (h- enhance tumor 
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progression by altering the normal secondary structure and eaqpression levels of RNA in cancer cdls. 
Other DEAD-box helicases have been implicated either directly or indirectly in tununigenesis 
(Discussed in Godbout, supra) . For exanq)le» murine p68 is mutated in ultraviolet light-induced 
tumors* and human DDX6 is located at a chromosomal breakpoint associated with B-cdl lynqihoma. 
5 Similarly, a chimeric protein co^^)rised of DDXIO and NUP98» a nuclecqxsrin protein, may be involved 
in the pathogenesis of certain mydc^d malignancies. 
ToDoisomerases 

Besides the need to separate DNA strands prior to rq)lication, the two strands must be 
'"unwound" from one another prior to thdr sq)aration by DNA hdicases. This function is performed by 
10 proteins known as DNA topoisomerases. DNA topoisomerase effectively acts as a revo-sible nuclease 
that hydrolyzes a phosphodiesterase bond in a DNA strand, permitting the two strands to rotate freely 
about one another to remove the strain of the heiix, and then rejoins the original phosphodicstcr bond 
between the two strands. Two types of DNA topoisomerase exist, types I and 11. DNA Topoisomerase 

I causes a single-strand break in a DNA hdix to allow the rotation of the two strands of the helix about 
15 the remaining phosphodiester bond in the opposite strand. DNA topoisomerase II causes a transient 

break in both strands of a DNA hdix where two double helices cross over one anotha-. This type of 
topoisomerase can efficiently sqiarate two intCTlocked DNA circles (Alberts, supra , pp.260-262). Type 

II topoisomerases are largdy confined to proliferating cells in eukaryotes, such as cancer cells. For this 
reason they are targets for anticancer drugs. Topoisomerase II has been implicated in multi-drug 

20 resistance (MDR) as it appears to aid in the rq)air of DNA damage infiicted by DNA binding agents 
such as doxorubicin and vincristine. 
Recombinases 

Genetic recombination is the process of rearranging DNA sequences within an organism's 
genome to provide genetic variation for the organism in response to changes in the oivironment. DNA 

25 recombination allows variation in the particular combination of genes present in an individual's 

genome, as wdl as the timing and levd of expression of these genes (see Alberts, supra, pp. 263-273). 
Two broad classes of gen^c recombination are commonly recognized, general recombination and site^ 
specific recombinatioa General recombination involves goietic exchange between any homologous 
pair of DNA sequ^ices usually located on two copies of the same chromosome. The process is aided 

30 by enzymes called recombinases that "nick" one strand of a DNA duplex more or less randomly and 
permit exchange with the conq)lementary strand of another duplex. The process does not normally 
change the arrangement of g^ies on a chromosome. In site-specific recombination, the recombinase 
recognizes specific nucleotide sequ^ices present in one or both of the recombining molecules. Base- 
pairing is not involved in this form of recombination and.therefore does not require DNA homology 

35 between the recombining molecules. Uiilike general recombination, this form of reconlbinationcan 
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alter the relative positions of micleotide sequoioes in chromosomes. 
Splicing Factors 

Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre-mRNA 
processing steps include capping at the 5' end with methylguanosine, polyadaiylating the 3' end, and 
splicing to remove introns. The primary RNA transcript from DNA is a faithfiil copy of the gene 
containing both exon and intron sequ^ices, and the latter sequences must be cut out of the RNA 
transcript to produce an mRNA that codes for a proteia This "splicing" of the mRNA sequence takes 
place in the nucleus with the aid of a large, multicomponent ribonucleoprotdn complex known as a 
spliceosome. The spliceosomal complex is conned of five small nuclear ribonuclepprotein particles 
(snRNPs) designated Ul. U2. U4. U5. and U6. and a nuraba of additional proteins. Each snRNP 
contains a single species of snRNA and about ten proteins. The RNA compon^ts of some snRNPs 
recognize and base pair with intron consensus sequences. The protein components mediate spliceosGme 
assOTibly and the splicing reactioa Autoantibodies to snRNP proteins are found in the blood of 
patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry . W.H. Freeman and 
Company. New York NY. p. 863). 

Adhesion Molecules 

SEQ ID N0:21 and SEQ ID NO:22 encode, for example, adhesion molecules. 

The surface of a cell is rich in transmembrane proteoglycans, glycoproteins, glycolipids, and 
recq)tors. These maaomolecules mediate adhesion with other cells and with components of the 
extracdlular matrix (ECM). The interaction of the cdl with Its surroundings profoundly influ^ces cdl 
shape, strength, flexibility, motility, and adhesioa These dynamic properties are intimately associated 
with signal transduction pathways controlling cdl proliferation and differentiation, tissue construction, 
and embryonic development. 
Cadherins 

Cadherins comprise a family of caldum-dependrat glycq)roteins that function in mediating 
cdl-cdl adhesion in virtually all solid tissues of multicellular organisms. These protdns share multiple 
repeats of a cadhenn-specific motif, and the rq)eats fom the folding units of the cadherin extracdlular 
domaia Cadherin molecules cooperate to form focal contacts, or adhesion plaques, betwera adjacent 
epithelial cdls. The cadherin family includes the classical cadherins and protocadherins. Classical 
cadherins inchide the E-cadherin, N-cadherin, and P-cadherin subfamilies. E-cadhedn is present on 
many types of q)ithdial cdls and is especially important for embryonic devdopment. N-cadherin is 
present on nerve, muscle, and lens cdls and is also critical for embryonic devdopment. P-cadherin is 
present on cdls of the placenta and q)idermis. Recent studies report that protocadherins are involved in 
a vari^ of cdl-cdl interactions (Suzuki, S.T. (1996) J. Cdl Sd. 109:2609-261 1). The intracellular 
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anchcvage of cadherins is regulated by their dynamic association with cateoins, a family of cytoplasmic 
signal transduction proteins associated with the actin cytoskdetoa The anchorage of cadherins to the 
actin cytoskd^on appears to be regulated by protdn tyrosine phosphorylation, and the cadherins are 
the target of phosphorylation-induced junctional disass^nbly (Aberle, H. et al. (1996) J. Cdl. Biochem. 
5 61:514-523). 
Integrins 

Integrins are ubiquitous transm^brane adhesion molecules that link the ECM to the internal 
cytoskdetoa Integrins are composed of two noncovalently associated transmembrane glycoprotein 
subunits called a and p. Integrins iunction as receptors that play a role in signal transductioa For 

1 0 exan^le, binding of integrin to its extracellular ligand may stimulate changes in intracellular calcium 
levels or protein kinase activity (Sjaastad, M.D. and WJ. Ndson (1997) BioEssays 19:47-55). At least 
ten cdl surface receptors of the integrin family recognize- tJie ECM GOii^x)nent fibronectin, which is 
involved in many different biological processes including cell migration and embryogenesis (Johansson, 
S. et al. (1997) Front. Biosd. 2:D126-D146). 

15 Lectins 

Lectins con^)rise a ubiquitous family of extracdlular glycoprotdns which bind cdl surface 
carbohydrates specifically and reversibly, resulting in the agglutination of cdls (reviewed in Drickamer, 
K. and M,E. Taylor (1993) Annu. Rev. Cdl Biol. 9:237-264). This fimction is particularly important 
for activation of the immune response. Lectins mediate the agglutination and mitog^c stimulation of 

20 lymphocytes at sites of inflammation (Lasky, L.A. (1991) J. Cdl. Biodiem. 45:139-146; Pai^ta, E. et 
al. (1989) J. Immunol. 143:2850-2857). 

Lectins are further classified into subfamilies based on carb(^ydrate-binding specificity and 
other criteria. The galectin subfamily, in particular, includes lectins that bind P-galactoside 
carbohydrate moieties in a thiol-dep^id^ manner (reviewed in Hadari, Y.R. et al. (1998) J. Biol. 

25 Chem. 270:3447-3453). Galectins are widely expressed and devdopmentally regulated. Because all 
galectins lack an N-terminal signal peptide, it is suggested that galectins are externalized through an 
atypical secretory medianism. Two classes of galectins have been defined based on molecular wdght 
and oligomerization properties. Small galectins form homodimers and are about 14 to 16 kilodaltons in 
mass, while large galectins are monomeric and about 29-37 kilodaltons. 

3 0 Galectins contain a characteristic carbohydrate recognition domain (CRD). The CRD is about 

140 amino adds and contains several stretches of about 1-10 amino adds which are highly conserved 
among all galectins. A particular 6-amino add motif within the CRD contains conserved tryptophan 
and arginine residues which are critical for carbohydrate binding. Hie CRD of some galectins also 
contains cystdne residues which may be important for disulfide bond formatioa Secondary structure 

3 5 predictions indicate that the CRD forms several ^-sheets. 
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Galectins play a number of rdes in diseases and conditions associated with cdl-cdl and cell- 
matrix interactions. For example, certain galectins associate with sites of inflanmiation and bind to cdl 
surface immunoglobulin E molecules. In addition, galectins may play an important role in caiK:er 
metastasis. Galectin overexpression is corrdated with the metastatic potential of cancers in humans 
and mice. Moreover, anti^galectin antibodies inhibit processes associated with cell transformation, such 
as cdl aggregation and anchQrage-indq)endenl growtii (See, for exan5)le, Su, Z.-Z. et al. (1996) Proc. 
Nati. Acad. Sd. USA 93:7252-7257). 
Sdectins 

Sdectins, (x LEC-CAMs, conq)rise a specialized lectin subfamily involved primarily in 
inflammation and leukocyte adhesion (Reviewed in Lasky, supra) , Selectins mediate the recruitment of 
leukocytes from the drculation to sites of acute inflammation and are expressed on tiie surface of 
vascular endothelial cdls in response to cytokine signaling, Selectins bind to specific ligands on the 
l^kocyte cdl membrane and enable the leukocyte to adhere to and migrate along the endotiidial 
surface. Binding of sdectin to its ligand leads to polarized rearrangement of the actin cytoskdeton and 
stimulates signal Uansduction within the leukocyte (Bramer, B. et al. (1997) Biochem. Biophys. Res. 
Commun. 231:802-807; Hidari, K.1. et al. (1997) J. Biol. Chem. 272:28750-28756). Members of tiie 
sdectin family possess three characteristic motifs: a lectin or carbohydrate recognition domain; an 
q)idermal growth factor-like domain; and a variable number of short consensus repeats (scr or "sushi" 
rqjeats) which are also present in complwnent regulatory protdns. The sdectins include lymphocyte 
adhesion molecule-1 (Lam-1 or L-sdectin), endothdial l^kocyte adhesion molecule-1 (ELAM-1 or E- 
sdectin). and granule membrane protdn-140 (GMP-140 or P-selectin) (Johnston, G.I. et al. (1989) Cdl 
56:1033-1044). 

Antigen Recognition Molecules 

All vertebrates have developed sophisticated and complex immime systems that provide 
protection from viral, bacterial, lungal, and parasitic infections. A key feature of the immune system 
is its ability to distinguish foreign molecules, or antigens, from "self* molecules. Hiis ability is 
mediated primarily by secreted and transmembrane proteins expressed by leukocytes (white blood 
cells) sudi as lymphocytes, granulocytes, and monocytes. Most of tiiese proteins belong to the 
immunoglobulin (Ig) superfamily, members of which contain one or more repeats of a cons^ed 
structural domain. This Ig domain is comprised of antiparallel p sheets joined by a disulfide bond in 
an arrangement called the Ig fold. Members of tiie Ig superfamily include T-cell recq)tors, major 
histocompatibility (MHC) proteins, antibodies, and immune cell-specific surface marko-s sudi as 
CD4, CD8,andCD28. 

MHC proteins are cell surface markers that bind to and present foreign antigens to T cells. 
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MHC molecules are classified as either class I or class II. Qass I MHC molecules (MHC I) are 
expressed on the surface of ahnost all cells and are involved in the presentation of antigen to 
cytotoxic T cdls. For example, a cell infected with virus will degrade intracellular viral proteins and 
express the protein fragments bound to MHC I molecules on the cell surface. The MHC I/antigen 
5 complex is recognized by cytotoxic T-cells which destroy the infected cell and the virus within. 
Gass II MHC molecules are expressed primarily on specialized antigen*presenting cells of the 
immune system, such as B-cells and macrophages. These cells ingest foreign proteins from the 
extracellular fluid and express MHC Il/antigen complex on the cell surface. Hiis complex activates 
helper T-cells, which then seaete cytokines and other factors that stimulate the immune response. 

10 MHC molecules also play an important role in organ rejection following transplantation. Rejection 
occurs when the recipient's T-cdls respond to foreign MHC molecules on the transplanted organ in 
the same way as to self MHC molecules bound to foreign antigen, (Reviewed in Alberts, B. et al. 
(1994) Molecular Biology of the Cell . Garland Publishing, New York NY, pp. 1229-1246.) 

Antibodies, or immimoglobulins, aie dther expressed on the surface of B-cells or secreted by 

1 5 B-cells into the circulation. Antibodies bind and neutralize foreign antigens in the blood and other 
extracellular fhiids. The prototypical antibody is a tetramer consisting of two identical heavy 
polypqjtide chains (H-chains) and two identical light polypeptide chains (L-chains) interlinked by 
disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. 
Antibodies are classified based on thdr H-chain composition. The five antibody classes, IgA, IgD, 

20 IgE, IgC and IgM, are defined by the a, fi, 6, Y> aiid ^ H-diain types. There are two types of L- 
diains, k and A, dther of which may associate as a pair with any H-chain pair. IgG, the most 
common class of antibody found in the circulation, is tetrameric, while the oth^ classes of antibodies 
are genially variants or multim^s of this basic structure. 

H-chains and L-chains each contain an N-terminal variable region and a C-teiminal constant 

2 5 region. The constant region consists of about 1 10 amino adds in L-chains and about 330 or 440 

amino adds in H-chains. The amino acid sequence of the constant region is nearly identical among 
H- or L-diains of a particular dass. The variable region consists of about 1 10 amino acids in both H- 
and L-chains. However, the amino acid sequence of the variable region differs among H- or L-chains 
of a particular class. Within each H- or L-chain variable region are three hypervariable regions of 
30 extensive sequence diversity, each consisting of about 5 to 10 amino acids. In the antibody molecule, 
the H- and L-chain liypervariable regions come together to form the antigen recognition site. 
(Reviewed in Albals. supra , pp. 1206-1213 and 1216-1217.) 

Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain 
contains four Ig domains, three of which occur within the constant region and one of which occurs 

3 5 within the variable region and contributes to the formation of the antigen recognition site. Likewise, 
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a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of 
whidi occurs within the variable region. 

Ihe inunune system is capable of recognizing and responding to any foreign molecule that 
entCTS the body. Therefore, the immune system must be armed with a full repertoire of antibodies 
5 against all potential antigens. Such antibody diversity is generated by somatic rearrangement of gene 
segments encoding variable and constant regions. These gene segments are joined together by site- 
specific recombination which occurs between highly conserved DNA sequences that flank each gene 
segment. Because there are hundreds of diffo'ent gene segments, millions of unique genes can be 
generated combinatorially. In addition, imprecise joining of these segments and an unusually high 
10 rate of somatic mutation within these segments further contribute to the generation of a diverse 
antibody population. 

T-cell recq)tors are both structurally and functionally related to antibodies. (Reviewed in 
Alberts, supra , pp. 1228-1229.) T-cell recq)tors are cell surface proteins that bind foreign antigens and 
mediate diverse aspects of the immune response. A typical T-cell recq)tor is a heterodimer comprised 

15 of two disulfide^linked polypq)tide chains called a and p. Eadi diain is about 280 amino acids in 
length and contains (Hie variable r^on and one constant regioa Each variable or constant r^on fcdds 
into an Ig domain. The variable regions from the a and p chains come together In the heterodimer to 
form the antigen recognition site. T-cdl recq>tQr diversity is generated by somatic reanangment of 
g^ segments encoding the a and p chains. T-cdlrecq>tors recognize small peptide antigens that are 

2 0 expressed on the surface of antigen-presenting cdls and pathogen-infected cells. Hiese pq)tide antig^ 
are presented on the cdl surface in association with major histocompatibility protdns Mdiich provide the 
proper context for antigen reception. 

Secreted and Extracellular Matrix Molecules 

2 5 SEQ ID NO:25 encodes, for example, a secreted/extracellular matrix molecule. 

Protein secrdion is ess^al for cellular function. Protein secr^on is mediated by a signal 
peptide located at the amino terminus of the protein to be secreted. The signal pq)tide is comprised of 
about ten to twenty hydrophobic amino acids vMch target the nascrat protein from the ribosome to the 
endqplasmic r^culum (ER). Proteins targ^ to the ER may either proceed through the secretory 

3 0 pathway or remain in any of the secretory organdies such as the ER, Golgi ^aratus, or lysosomes. 

Proteins that transit through the secretory pathway are either sea^ into the extracellular space or 
retained in the plasma membrane. Secreted proteins are often synthesized as inactive precursors that 
are activated by post-translational processing events during transit through the seo-etory pathway. 
Such ev^ts include glycosylation, proteolysis, and removal of the signal pq)tide by a signal pq}tidase. 
35 Other events that may occur during protein transport include chaperone-dependent unfolding and 
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folding of the nascent protdn and interaction of the protein with a receptor or pore complex. Examples 
of secreted proteins with amino tenninal signal pq)tides include receptors, extraceUular matrix 
molecules, cytokines, hormones, growth and differentiation factors, neuropeptides, vasomediators, ion 
channels, transporters/pumps, and proteases. (Reviewed in Alberts, B. et al. (1994) Molecular Biology 
ofTheCdL Garland Publishing, New York NY. pp. 557-560. 582-592.) 

Hie extracellular matrix (ECM) is a complex network of glycoproteins, polysaccharides, 
proteoglycans, and other macromolecules that are secreted from the cell into the extracellular space. 
Hie ECM remains in close association with the cell surface and provides a supportive meshwork that 
profoundly influences cell shape, motility, strength, flexibility, and adhesion. In fact, adhesion of a 
cell to its surrounding matrix is required for cell survival except in the case of metastatic tumor cells, 
which have overcome the need for cdl-ECM anchorage. This phenomenon suggests that the ECM 
plays a critical role in the molecular mechanisms of growth control and metastasis. (Reviewed in 
RuoslahU. E. (1996) Sci. Am, 275:72-77.) Furtharoore, the ECM determines the structure and 
physical properties of connective tissue and is particularly important for morphogenesis and other 
processes associated with anbryonic development and pattern formatioa 

The collagens comprise a family of ECM proteins that provide structure to bone, teeth, skin, 
ligamaits. taidons, cartilage, blood vessels, and basement membranes. Multiple collagen proteins have 
been identified. Three collagen molecules fold together in a triple hdix stabilized by interchain disulfide 
bonds. Bundles of these triple hdices then associate to form fibrils. Collagen primary structure 
consists of hundreds of (Gly-X-Y) repeats where about a third of the X and Y residues are Pro. 
Glycines are crucial to hdix formation as the bulkier amino acid sidechains cannot fold into the triple 
helical conformation. Because of these strict sequence requirements, mutations in collagen genes have 
severe consequences. Osteogoiesis imperfecta patients have brittle bones that fracmre easily; in sevo-e 
cases pati^ts die in utero or at birth. Ehlers-Danlos syndrome patioits have hyperdastic skin, 
hypermobile joints, and susceptibility to aortic and intestinal rupture. Chondrodysplasia patients have 
short stature and ocular disorders. Alport syndrome patients have hematuria, sensorineural deafness, 
and eye lens deformation. (Issdbadier. K.J. et al. (1994) Harrison's Princtoles of Internal Medicine . 
McGraw-Hill, Inc., New York NY. pp. 2105-21 17; and Crd^ton, T.E. (1984) Protdns. Structures 
and Molecular Prindples . W.H. Freeman and Company, New York NY, pp. 191-197.) 

Elastin and rdated protdns confer dastidty to tissues such as skin, blood vessds, and lungs. 
Elastin is a highly hydrophobic protdn of about 750 amino adds that is rich in proline and glydne 
residues. Elastin molecules are highly cross-linked, forming an extensive extracellular n^ork of fibers 
and sheets. Elastin fibers are surrounded by a sheath of microfibrils which are composed of a number 
of glycqprotdns, including fibrillin. Mutations in the gene encoding fibrillin are responsible for 
Marfan's syndrome, a goietic disorder characterized by ddects in connective tissue. In severe cases. 
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the aortas of afflicted individuals are prone to rupture. (Reviewed in Alberts, supra , pp. 984-986.) 

Fibronectin is a large ECM glycoprotein found in all vertebrates. Fibronectin exists as a dimer 
of two subunits, each containing about 2,500 amino adds. Each subunit folds into a rod-like structure 
containing multiple domains. The domains each contain multiple rq)eated noodules, the most common 
5 of whidi is the type in fibronectin repeat The type ni fibronectin rqjeat is about 90 amino acids in 
length and is also found in other ECM proteins and in some plasma membrane and cytoplasmic 
protons. Furthermare, some type in fibronectin repeats contain a characteristic tripq)tide consisting of 
Arginine-Glycine-Aspartic acid (RGD). The RGD sequence is recognized by the integrin family of cell 
surface receptors and is also found in other ECM proteins. Disruption of both copies of the gene 

10 encoding fibronectin causes early embryonic lethality in mice. The mutant embryos display extensive 
morphological defects, including defects in the formation of the notochord, somites, heart, blood 
v^sds, neural tube, and extraembryonic stnietures. (Reviewed in Alberts, supra, pp. 986-987.) 

Laminin is a major glycoprotdn component of the basal lamina which underlies and supports 
qjithelial cell sheets. Laminin is one of the first ECM proteins synthesized in the developing embryo. 

1 5 Laminin is an 850 kilodalton protdn composed of three polypq)tide chains joined in the shape of a 
cross by disulfide bonds. Laminin is especially important for angiogenesis and in particular, for 
guiding the formation of capillaries. (Reviewed in Alberts, supra , pp. 990-991.) 

There are many other types of protdnaceous ECM components, most of which can be 
classified as proteoglycans. Proteoglycans are conposed of unbranched polysaccharide chains 

2 0 (glycosaminoglycans) attached to protdn cores. Common proteoglycans include aggrecan, betaglycan, 
decorin, perlecan, serglydn, and syndecan-1 . Some of these molecules not only provide mechanical 
support, but also bind to extracdlular signaling molecules, such as fibroblast growth factor and 
transforming growth factor p, suggesting a role for proteoglycans in cdl-cdl conmiunication and cdl 
growth. (Reviewed in Alberts, supra , pp. 973-978.) Likewise, the glycoprotdns tenascin-C and 

2 5 tenasdn-R are expressed in devdq}ing and lesioned neural tissue ami provide stimulatory and anti- 
adhesive (inhibitory) properties, respectivdy. for axonal growth. (Faissner, A. (1997) Cdl Tissue Res. 
290:331-341,) 



Cytoskeletal Molecules 

30 SEQ ID NO:26 and SEQ ID NO:27 encode, for example, cytoskeletal molecules. 

The cytoskeleton is a cytoplasmic network of protein fibers that mediate cell shape, structure, 
and movement. The cytoskeleton supports the cell membrane and forms tracks along which 
organelles and other elements move in the cytosol. Hie cytoskdeton is a dynamic structure that 
allows cdls to adopt various sh^es and to carry out directed movements. Major cytoskeletal fibers 

3 5 include the microtutailes, the miaofilaments, and the intermediate filaments. Motor proteins, 
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including myosin, dyndn, and kinesin, drive movement of or along the fibers. Hie motor protein 
dynamin drives the formation of membrane vesicles. Accessory or associated proteins modify the 
structure or activity of the Gbets while cytoskeletal membrane anchors connect the fibers to the cell 
membrane. 
5 Tubulins 

Microtubules, cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the 
cell. Bundles of microtubules form cilia and flagella, which are whip-like extensions of the cell 
membrane that are necessary for sweeping materials aaoss an epithelium and for swimming of 
sperm, respectively. Marginal bands of microtubules in red blood cells and platelets are important for 

10 these cells' pliability. Organelles, membrane vesicles, and proteins are transported in the cell along 
tracks of miCTOtubules. For example, microtubules run through nerve cell axons, allowing bi- 
directional transport of mataials and membrane vesicles between the cell body and the nerve 
terminal. Failure to supply the nerve terminal with these vesicles blocks the transmission of neural 
signals. Microtubules are also critical to chromosomal movement during cell division. Both stable 

15 and short-lived populations of microtubules exist in the cell. 

Microtubules are polymers of GTP-binding tubulin protein subunits. Each subunit is a 
het^odim^ of a- and P- tubulin, multiple isoforms of which exist The hydrolysis of GIP is linked 
to the addition of tubulin subunits at the end of a microtubule. The subunits interact head to tail to 
form protofilaments; the protofilaments interact side to side to form a microtubule. A microtubule is 

20 polarized, one end ringed with a-tubulin and the other with P-tubulin, and the two ends differ in their 
rates of assembly. Generally, each microtubule is composed of 1 3 protofilaments although 1 1 or 1 5 
protofQament-microtubules are sometimes found. Cilia and flagdla contain doublet microtubules. 
Microtubules grow from specialized structures known as centrosomes or microtubule-organizing 
centers (MT(X)s). MTOCs may contain one or two centrioles, whidi are pinwheel arrays of triplet 

2 5 microtubules. The basal body, the organizing center located at the base of a cilium or fiagellum, 
contains one centriole. Gamma tubulin present in the MTOC is important for nucleating the 
polymerization of a- and p- tubulin heterodimers but does not polymerize into miaotubules. 
Microtubule- Associated Proteins 

Microtubule-associated proteins (MAPs) have roles in the assembly and stabilization of 

30 microtubules. One major family of MAPs, assembly MAPs, can be idratified in neurons as wdl as 
non-naironal cells. Assembly MAPs are responsible for cross-linking miaotubules in the cytosol. 
These MAPs are organized into two domains: a basic raicrotubule-binding domain and an acidic 
projection domain. The projection domain is the binding site for membranes, intamediate filaments, or 
other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two 

35 types: Type I and Type II. Type I MAPs, which include MAPI A and MAPIB, are large, filamentous 
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molecules that co-puiify with microtubules and are abundantly expressed in brain and testes. Type I 
MAPS contain several repeats of a positively-charged amino acid sequence motif that binds and 
neutralizes negadvdy charged tubulin, leadipg to stabilization of miaotubules. MAFIA and MAPIB 
are eadi derived from a single precursor polypqptide that is subsequently proteolytically processed to 
5 generate one heavy chain and one light chaia 

Another light chain, LC3, is a 16.4 kDa molecule that binds MAFIA, MAPIB, and 
miaotubules. It is suggested that LC3 is synthesized from a source other than the MAPI A or MAPIB 
transcripts, and that the expression of LC3 may be important in regulating the microtubule binding 
activity of MAPI A and MAPIB during cdl proliferation (Mann. S.S. et al. (1994) J. Biol. Chem. 

10 269:11492-11497). 

Type II MAPS, which include MAP2a, MAP2b. M AF2c. MAP4, and Tau, are characterized 
by three to four c^ies of an 18-residue science in the Mcrotubule-binding domain, MAJP2a, MAP2b, 
and MAP2c are found only in dendrites, MAP4 is found in non-neuronal cdls, and Tau is found in 
axons and dendrites of nerve cdls. Alternative splicing of the Tau mRNA leads to the existence of 

15 multiple forms of Tau protda Tau phosphorylation is altered in neurodegena-ative disorders such as 
Alzheimer's disease. Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and 
familial frontotemporal dementia and Parkinsonism linked to chromosome 17. The altered Tau 
phosphorylation leads to a collapse of the miaotubule network and the formation of intraneuronal 
Tau aggregates (SpiUantini. M.G. and M. Goedert (1998) Trends Neurosci. 21:428-433). 

20 The protein pericentrin is found in the MTOC and has a role in microtubule assembly. 

Actins 

Microfilaments, cytoskeletal filaments with a diameter of about 7-9 nm, are vital to cell 
locomotion, cdl sh^, cell adhesion, cell division, and muscle contraction. Assembly and 
disassembly of the microfilaments allow cells to change their moiphology. Microfilaments are the 

25 polymerized fonn of actin, the most abundant intracdlular protein in the eukaryotic cell. Human cells 
contain sbc isoforms of actin. Hie three a-actins are found in different kinds of muscle, nonmuscle P- 
actin and nonmuscle y-actin are found in nonmuscle cells, and another y-actin is found m intestinal 
smooth muscle cells. G-actin, the monom^c form of actin. polymerizes into polarized, helical F- 
actin filaments, accompanied by the hydrolysis of ATP to ADR Actin filaments associate to form 

30 bundles and networks, providing a framework to support the plasma membrane and determine cell 
shsqpe. These bundles and networks are connected to the cell membrane. In muscle cells, thin 
filaments containing actin slide past thick filaments containing tiie motor protein myosin during 
contraction. A family of actin-related proteins exist that are not part of die actin cytoskeleton. but 
rather associate with microtubules and dynein. 

35 Actin-Associated Proteins 
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Actin-associated proteins have roles in cross-linking, sevmng, and stabilization of acdn 
filaments and in sequestering actin monomers. Several of the actin-associated proteins have multiple 
functions. Bundles and networks of actin illam^ts are held togeth^ by actin cross-linking proteins. 
Tbese proteins have two actin-binding sites, one for each filament. Short cross-linking proteins 
promote bundle formation while longCT, more flexible cross-linking proteins promote network 
formation. Calmodulin-like calcium-binding domains in actin aoss-linking proteins allow calcium 
regulation of cross-linking. Group I cross-linking proteins have unique actin-binding domains and 
include the 30 kD protein, EF-la, fascin, and scruin. Group II cross-linking proteins have a 7,000- 
MW actin-binding domain and include villin and dematin. Group III cross-linking proteins have 
pairs of a 26,000-MW actin-binding domain and include fimbrin, spectrin, dystrophin, ABP 120, and 
filamin. 

Sevaing proteins regulate the length of actin filaments by breaking them into short pieces or 
by blocking their ends. Severing proteins include gCAP39, severin (fragmin), gelsolin, and villin. 
Csqjping proteins can c^ flie ends of actin filaments, but cannot break filaments. Capping proteins 
include CapZ and tropomodulin. The proteins thymosin and profilin sequester actin monomers in the 
cytosol, allowing a pool of unpolym^zed actin to exist. The actin-associated proteins tropomyosin, 
troponin, and caldesmon regulate muscle contraction in response to calcium. 
Intermediate Filaments and Associated Proteins 

Intermediate filaments (IFs) are cytoskeletal fibers with a diameter of about 10 nm, 
intermediate between that of microfilammts and microtubules. IFs serve structural roles in the cdl, 
reinforcing cells and organizing cells into tissues. IPs are particularly abundant in epidermal cdls and 
in neurons. IFs are extremely stable, and, in contrast to mia-ofilaments and microtubules, do not 
function in cell motility. 

Five types of IF proteins are known in mammals. Type I and Type II proteins are the acidic 
and basic keratins, respectively. Heterodimers of the acidic and basic keratins are the bmlding blocks 
of keratin IFs. Keratins are abundant in soft epithelia such as skin and cornea, hard epithdia such as 
nails and hair, and in q^ithdia that line internal body cavities. Mutations in keratin gexies lead to 
epithelial diseases including epidermolysis bullosa simplex, bullous congenital ichthyosiform 
erythroderma (epidermolytic hyperkeratosis), non-q>idermolytic and q)idermolytic palmoplantar 
keratoderma, ichthyosis bullosa of Siemens, pachyonychia congenita, and white sponge nevus. Some of 
these diseases result in sevo-e skin blistolng. (See, e.g., Wawersik, M. ^ al. (1997) J. Biol. Chem. 
272:32557-32565; and CordenL.D. and W.H. McLean (1996) Exp. Dermatol. 5:297-307.) 

Type III IF proteins include desmin, glial fibrillary acidic protdn, vimentin, and peripherin. 
Desmin filaments in muscle cells link myofibrils into bundles and stabilize sarcomeres in contracting 
muscle. Glial fibrillary acidic protein filaments are found in the glial cells that surround neurons and 
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astrocytes. Vimentm filaments are found in blood vessd endothelia] ceils, some q)itheUal cells, and 
mesaichymal cdls such as fibroblasts* and are commonly associated with microtubules. Vimentin 
filanms may have roles in keeping the nucleus and other organdies in place in the cell. Type IV IFs 
include the nairofilaments and nestia Neurofilaments, con5)Osed of three polypq)tides NF-L. NF-M» 
5 and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for 
the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmissioa 
Changes in phosphorylation and m^bolism of neurofilaments are observed in neurodegenerative 
diseases including amyotrophic lateral sclerosis. Parkinson's disease* and Alzhdmo-'s disease (Julien» 
J.P. and W.E. Mushynski (1998) Prog. Nucldc Add Res. Mol. Biol. 61:1-23). Type V IFs, the lamins, 

10 are found in the nucl«is where they support the nuclear manbrane. 

IFs have a central a-helical rod region intorupted by short nonhdical linker segmrats. The rod 
region is bracketed, in most cases, by non-helical head and tail domains. The rod regions of 
intermediate filament protdns assodate to fOTm a coiled-coil dimer. A highly ordered assembly process 
leads from the dimers to the IFs. Ndther ATP ncM* GTP is needed for IF assembly, unlike that of 

1 S microfilaments and microtubules. 

IF-assodated protdns (IFAPs) mediate the interactions of IFs with one another and with other 
cdl structures. IFAPs cross-link IFs into a bundle, into a n^work, or to the plasma m^brane, and 
may cross-link IFs to the microfilament and microtubule cytoskdeton. Microtubules and IFs are in 
particular closdy assodated. IFAPs include BPAGl, plakoglobin. desmoplakin I, desmoplakin II, 

20 plectin, ankyrin, filaggrin, and lamin B receptor. 
Cvtoskeletal-Membrane Anchors 

Cytoskeletal fibers are attached to the plasma membrane by specific protdns. These 
attachments are important for maintaining cell shape and for muscle contraction. In erythrocytes, the 
spectrin-actin cytoskdeton is attached to cell membrane by three proteins, band 4.1, ankyrin, and 

2 5 adducin. Defects in this attachment result in abnoimally shaped cells which are more rapidly 

degraded by the spleen, leading to anemia. In platelets, the spectrin-actin cytoskdeton is also linked 
to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin. In 
muscle cdls the protein dystrophin links actin filaments to the plasma membrane; mutations in the 
dystrophin gene lead to Duchenne muscular dystrophy. In adh^ens junctions and adhesion plaques 
30 the peripheral membrane proteins a-actinin and vinculin attach actin filaments to the cdl membrane. 

IFs are also attached to membranes by cytoskdetal-membrane anchors. The nudear lamina is 
attached to the inner surface of the nuclear membrane by the lamin B receptor. Vimentin IFs are 
attached to the plasma membrane by ankyrin and plectin. Desmosome and hemidesmosome 
membrane junctions hold togeth^ epithelial cells of organs and skin. These membrane junctions 

3 5 allow shear forces to be distributed across the entire epithelial cell layer, tiius providing strength and 
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ngidiQr to the q>ithelium. IFs in q)itlielial cells aie attached to the desmosome by plakoglobin and 
desmoplakins. The poteins that link IFs to hemidesmosomes are not knowa Desmin OFs surround 
the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin» and ankyrin. 
Mvosin-related Motor Proteins 
5 Myosins are actin-activated ATPases, found in eukaryotic cells, that couple hydrolysis of 

ATP with motion. Myosin provides the motor function for muscle contraction and intracellular 
movements such as phagocytosis and rearrangement of cell contents during mitotic cell division 
(cytokinesis). The contractile unit of skeletal muscle, termed the sarcomere, consists of highly ordered 
arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form 

10 between the thick and thin filaments, and the ATP-dq)aidait movement of myosin heads within the 
thick filaments pulls the thin filaments, shortening the sarcomere and thus the muscle fiber. 

Myosins are composed of one or two heavy chains and associated light chains. Myosin 
heavy chains contain an amino-terminal motor or head domain, a neck that is the site of light-chain 
binding, and a carboxy-terminal tail domain. The tail domains may associate to form an a-heUcal 

1 5 coiled coil. Conventional myosins, such as those found in muscle tissue, are composed of two 
myosin heavy-chain subunits, each associated with two light-chain subunits that bind at the neck 
region and play a regulatory lole. Unconventional myosins, believed to function in intracellular 
motion, may contain either one or two heavy chains and associated light chains. Tbm is evidence for 
about 25 myosin heavy chain genes in vertebrates, more than half of them unconventional. 

20 Dvndn-related Motor Proteins 

Dyneins are (0 end-directed motor protdns which aa on microtubule^^ Two classes of 
dyneins, cytosolic and axonemal, have been identified. Cytosolic dyndns are responsible for 
translocation of materials along cytoplasmic microtubules, for acanq)le, transport from the nerve 
termmal to the cell body and transport of endocytic vesicles to lysosomes. Cytq)lasmic dyneins are 

2 5 also reported to play a role in mitosis. Axonranal dyneins are responsible for the beating of flagdla and 

cilia. Dynein on one microtubule doublet walks along the adjacent niicrotubule doublet This sliding 
force produces bending forces that cause the fiagdlum or dlium to beat Dyndns have a native mass 
betwera 1000 and 2000 kDa and contain dther two or three force-producing heads driven by the 
hydrolysis of ATP. The heads are linked via stalks to a basal domain which is composed of a highly 

3 0 variable number of accesses intermediate and light chains. 

Kinesin-related Motor Proteins 

Kinesins are (+) end-directed motor protdns v^ch act on microtubules. The prototypical 
kinesin molecule is involved in the transport of membrane-bound vesicles and organelles. This function 
is particularly inportant for axonal transport in n»irons. Kinesin is also inqxntant in all cell types for 
35 the transport of vesicles from the Golgi conq)lex to the endoplasmic reticulum. Hiis role is aitical for 
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maintaining tlie identity and functionality of these secretory organelles. 

Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at 
least 8 subfamilies based on primary amino acid sequence, domain structure, vdodty of movanent, and 
cellular iiinctioa (Reviewed in Moore, J.D. and S.A. Endow (1996) Bioessays 18:207-219; and Hoyt, 
A.M. (1994) Curr. Gpin. Cell Biol. 6:63-68.) The prototypical kinesin molecule is a heterotetramer 
comprised of two heavy polypqjtide chains (KHCs) and two light polypq}tide chains (KLCs). The 
KHC subunits are typically referred to as "kinesin.'* KHC is about 1000 amino adds in l^gth, and 
KLC is about 550 amino acids in length. Two KHCs dimerize to form a rod-shaped molecule with 
three distina regions of secondary structure. At one end of the molecule is a globular motor domain 
that functions in ATP hydrolysis and microtubule binding. Kinesin motCH- domains are highly conserved 
and share ov^ 70% id^tity. Beyond the motor domain is an a-helical coiled-coil region which 

molecular cargo. The tail is formed by the interaction of the KHC C-termini with the two KLCs. 

Members of the m^e divergent subfamilies of kinesins are called kinesin-related proteins 
(KRPs). many of which function during mitosis in eukaryotes (Hovt. supra) . Some KRPs are required 
for assembly of the mitotic spindle. In vivo and in vitro analyses suggest that these KRPs exert force 
on microtubules that con^rise the mitotic spindle, resulting in the sq^aration of spindle poles. 
Phosphorylation of KRP is required for this activity. Failure to assemble the mitotic spindle results in 
abortive mitosis and chromosomal aneuploidy, the latter condition being characteristic of cancer cells. 
In addition, a unique KRP, centromere protein E, localizes to the kinetochore of human mitotic 
chromosomes and may play a role in their segregation to q>posite spindle poles. 
Dvnamin-related Motor Proteins 

Dynamin is a large GTPase motor protein that functions as a "molecular pinchase," 
generating a mechanochemical force used to sever membranes. This activity is important in fonning 
clathrin-coated vesicles firom coated pits in endocytosis and in the biogenesis of synaptic vesicles in 
neurons. Binding of dynamin to a membrane leads to dynamin's self-assembly into spirals that may 
act to constrict a flat membrane surface into a tubule. GTP hydrolysis induces a change in 
conformation of the dynamin polymer that pinches the membrane tubule, leading to sevoing of the 
membrane tubule and formation of a membrane vesicle. Release of GDP and inorganic phosphate 
leads to dynamin disassembly. Following disassembly the dynamin may either dissociate from the 
m^brane or remain associated to the vesicle and be transported to another region of the cell. Three 
homologous dynamin genes have been discovered, in addition to several dynamin-related proteins. 
Conserved dynamin regions are the N-terminal GTP-binding domain, a central pleckstrin homology 
domain that binds membranes, a central coiled-coil region that may activate dynamin's GTPase 
activity, and a C-terminal proline-rich domain that contains several motifs that bind SH3 domains on 
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othCT proteins. Some dynamin-related proteins do not contain the pleckstrin homology domain or the 
proline-rich domain. (See McNiven, MA. (1998) Cell 94:151-154; Scaife, R,M. and R.L. Margolis 
(1997) Cell. Signal. 9:395-401.) 

The cytoskeleton is reviewed in Lodish, H. et al, (1995) Molecular Cell Biology . Scientific 
American Books, New York NY. 

Ribosomal Molecules 

SEQ ID NO:30 and SEQ ID N0:31 encode, for example, ribosomal molecules. 

Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, 
which are cytoplasmic particles that translate messengo" RNA into polypeptides. The eukaryotic 
ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 
SOS ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome also contains more 
than fifty proteins. Hie ribosomal protdns have a prefix which denotes the subunit to which they 
belong, eitho* L Oarge) or S (small). Ribosomal protein activities include binding rRNA and 
organizing the conformation of the junctions between rRNA helices (Woodson, S.A. and N.B. 
Leontis (1998) Cuir. Opin. Struct Biol. 8:294-300; Ramakrishnan, V. and S.W. White (1998) Trends 
Biochem. Sci. 23:208-212.) Three impcstant sites are identified on the ribosome. Hie aminoacyl- 
tRNA site (A site) is where charged tRNAs (with the excq)tion of the Initiator-tRNA) bind on arrival 
at the ribosome. The peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as 
where the initiator tRNA binds. Hie exit site (E site) is where deacylated tRNAs bind prior to their 
release from the ribosome. (Hie ribosome is reviewed in Stryer, L. (1995) Biochemistrv W.H. 
Freeman and Company, New York NY. pp. 888-908; and Lodish, H. et al. (1995) Molecular Cell 
Biolopv Sdentiiic American Books, New York NY. pp. 1 19-138.) 

Chromatin Molecules 

The nuclear DNA of eukaryotes is organized into chromatin. Two types of chromatin are 
observed: euchromatin, some of which may be transcribed, and heterochromatin so densely packed that 
much of it is inaccessible to transcriptioa Chromatin packing thus serves to regulate protein 
expression in eukaryotes. Bacteria lack chromatin and the chromalin-packing level of gaie regulation. 

The fundamental unit of chromatin is the nucleosome of 200 DNA base pairs associated with 
two copies each of histones H2A, H2B, H3, and H4. Adjascent nucleosomes are linked by anotho* 
class of histones, HI. Low molecular weight non-histone proteins called the high mobility group 
(HMG), associated with chromatin, may function in the unwinding of DNA and stabilization of single- 
stranded DNA. Chromodomain proteins function in conpaction of chromatin into its transcriptionally 
silent heterochromatin form. 
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During mitosis, aU DNA is oonq)acted into heterodiromatin and transcription ceases. 
Transoiption in inteq)liase begins with the acti vaticm of a region of diromatia Active chromatin is 
decondensed. Decondensation appears to be accoiXQ>anied by changes in binding coefQdent, 
phosphorylation and acetylation states of chromatin histones. HMG proteins HMG 1 3 and HMG 1 7 
5 selectively bind activated diromatin. Tqpoisomerases remove superhelical tension on DNA. The 
activated region decondenses, allowing gene regulatory proteins and transcription factors to assemble 
on the DNA. 

Patterns of chromatin structure can be stably inherited, producing heritable patterns of goie 
expression In mammals, one of the two X chromosomes in each female cdl is inactivated by 
1 0 condensation to heterochromatin during zygote devdC5)ment. The inactive state of this chromosome is 
inherited, so that adult females are mosaics of clusters of patemal-X and maternal-X clonal cell groups. 
The condensed X chromosome is reactivated in meiosis. 

Chromatin is associated with disorders of protein expression such as thalassania, a gaietic 
anemia resulting from the removal of the locus control region (LCR) required for decondensation of the 
15 globin gene locus. 

For a review of chromatin structure and function see Alberts, B, et al. (1994) Molecular Cell 
Biologv . third edition. Garland Publishing, Inc., New York NY, pp. 35 1-354. 433-439. 

Electron Transfer Associated Molecules 
20 SEQ ID NO:23 and SEQ ID NO:24 encode, for example, electron transfer associated 

molecules. 

Electron carriers such as cytochromes accq)t electrons from N ADH or FADHj and donate 
them to other electron cairim. Most electron-transferring proteins, except ubiquinone, are prosthetic 
groups such as flavins, heme, FeS clusters, and copper, bound to inner membrane proteins. 
2 5 Adrenodoxin, for exanq)le, is an FeS protein that forms a complex with N ADPH :adrenodoxin 
reductase and cytochrome p450. Cytochromes contain a heme prosthetic group, a porphyrin ring 
containing a tightly bound iron atom. Electron transfer reactions play a crucial role in cellular energy 
production. 

Energy is produced by the oxidation of glucose and fatty acids. Glucose is initially converted 
30 to pyruvate in the cytoplasm. Fatty acids and pyruvate are transpoited to die mitochondria for 

complete oxidation to CO2 coupled by enzymes to the transport of electrons from NADH and FADHj 
to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and Pj. 

Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via 
the citric acid cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and 
35 dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, 
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aconitases, isodtrate dehydrog^iase* alpha-ketoglutarate ddiydrogenase conqplex including 
transsucdnylases, succinyl CoA synthetase, succinate dehydrogenase, iiunarases, and malate 
dehydrogenase. Acetyl CoA is oxidized to COj with concomitant formation of NADH, FADH2, and 
GTP. In oxidative phosphorylation, the transfer of elections from NADH and FADH2 ^ oxygen by 
dehydrogenases is coupled to the synthesis of ATP from ADP and P, by the FqF, ATPase complex in 
the mitochondrial inner membrane. Enzyme complexes responsible for dectron transport and ATP 
synthesis indude the FqF, ATPase complex, ubiquinone(CoQ)*cytochrome c reductase, ubiquinone 
reductase, cytochrome b, cytochrome c,, FeS piotdn, and cytochrome c oxidase. 

ATP synthesis requires membrane transport enzymes including the phosphate transporter and 
the ATP-ADP antiport protdn. The ATP-binding casette (ABC) superfamily has also been suggested 
as belonging to the mitochondrial transport group (Hogue. Di. et al. (1999) J. Mol. Biol. 285:379- 
389). Brown fat uncoupling protein dissipates oxidative eneTgy as heat, and may be involved the fever 
response to infection and trauma (Cannon. B. et al. (1998) Ann. NY Acad. Sd. 856:171-187). 

Mitochondria are oval-shaped organelles comprising an outer membrane, a tightly folded 
inner membrane, an intermembrane space between the outer and inner membranes, and a matrix 
inside the inner membrane. The outer membrane contains many porin molecules that allow ions and 
charged molecules to enter the intermembrane space, while the inner membrane contains a variety of 
transport protdns that transfer only selected molecules. Mitochondria are the primary sites of energy 
production In cells. 

Mitochondria contain a small amount of DNA. Human mitochondrial DNA encodes 1 3 
protdns, 22 tRNAs, and 2 rRNAs. Mitochondrial-DN A encoded protdns include NADH-Q 
reductase, a cytochrome reductase subunit, cytochrome oxidase subunits, and ATP synthase subunits. 

Electron-transfer reactions also occur outside the mitochondria in locations such as the 
endoplasmic reticulum, which plays a crucial role in lipid and protdn biosynthesis. Cytochrome b5 
is a central dectron donor for various reducdve reactions occurring on the cytoplasmic surface of liv^ 
endoplasmic reticulum. Cytochrome b5 has been found in Golgi, plasma, endoplasmic reticulum 
(ER), and mioobody membranes. 

For a review of mitodiondrial metabolism and regulation, see Lodish, H. et al. (1995) 
Molecular Cell Biology . Scientific Amaican Books, New Yoric NY, pp. 745-797 and Stryer (1995) 
Biochemistrv> W.H. fteeman and Co., San Francisco CA..pp 529-558, 988-989. 

Tlie majority of mitochondrial protdns are encoded by nuclear genes, are synthesized on 
cytosolic ribosomes, and are imported into the mitochondria. Nuclear-encoded proteins whidi are 
destined for the mitochondrial matrix typically contain positively-charged amino terminal signal 
sequences. Import of these prq)roteins from the cytoplasm requires a multisubunit protein complex 
in the oaXer membrane known as the translocase of outer mitochondrial membrane (TOM; previously 
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designated MOM; Pfanner, N. et al. (1996) TVends Biochem. Sd. 21 :51-52) and at least three inn^ 
membrane protdns which comprise the translocase of inner mitochondrial membrane (TIM; 
previously designated MIM; Pfanner, supra) . An inside^negative membrane potential aaoss (he inner 
mitochondrial membrane is also required for preprotein import. Preproteins are recognized by surface 
5 receptor components of the TOM complex and are translocated through a proteinaceous pore formed 
by other TOM components. Proteins targeted to the matrix are then recognized by the import 
machinery of the TIM complex. The import systems of the out^ and inner membranes can function 
independenUy (Segui-Real, B. et al. (1993) EMBO J. 12:2211-2218). 

Once precunor proteins are in the mitochondria, the leader peptide is cleaved by a signal 

10 peptidase to generate the mature protein. Most leader peptides are removed in a one step process by a 
protease termed mitochondrial processing peptidase (MPP) (Paces, V. et al. (1993) Proc. Natl. Acad. 
Sci. USA 90:5355-5358). In some cases a two-step process occurs in which MPP generates an 
intermediate precursor form which is cleaved by a second enzyme, mitochondrial intermediate 
pq>tidase, to gener£^ the mature protein. 

15 Mitodiondrial dysfunction leads to impaired calcium buffering, generation of free radicals 

that may participate in ddeterious intracellular and extracellular processes, changes in mitochondrial 
permeability and oxidative damage whidi is observed in several neurodegenmtive diseases. 
Neurodegenerative diseases linked to mitochondrial dysfunction include some forms of Alzheimer's 
disease, Friedreich's ataxia, familial amyotrophic lataal sclerosis, and Huntington's disease (Beal, 

20 M.F. (1998) Biochim. Biophys. Acta 1 366:21 1-213). The myocardium is heavily dependent on 
oxidative metabolism, so mitochondrial dysfunction often leads to heart disease (DiMauro, S. and M. 
Hirano (1998) Curr. Opin. Cardiol 13:190-197). Mitochondria are implicated in disorders of cell 
prolifi^ation, since they play an important role in a cell's decision to proliferate or self-destruct 
through apoptosis. The oncoprotein Bcl-2, for example, promotes cell prolif^ation by stabilizing 

25 mitochondrial meml^ranes so that ^optosis signals are not rdeased (Susin, S.A. (1998) Biochim. 
Biophys. Acta 1366:151-165). 

Transcription Factor Molecules 

SEQ ID NO:32, SEQ ID NO:33. SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36. SEQ ID 
30 NO:37, SEQ ID NO:38. SEQ ID NO:39, SEQ ID NO:40, SEQ ID N0:41, SEQ ID NO:42, and SEQ 
ID N0:43 encode, for example, transcription factor molecules. 

Multicellular organisms are comprised of diverse cell types that differ dramatically both in 
structure and function. The identity of a cell is determined by its charactaistic pattern of gene 
expression, and different cell types express overtyping but distinctive sets of genes throughout 
3 5 development Spatial and temporal regulation of gene expression is critical for the control of cell 
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proliferation, cell differentiation^ apoptosis, and other processes that contribute to organismal 
development Furthermore, gene expression is regulated in response to extraceUular signals that 
mediate cell-cell conununication and coordinate the activities of different cell types. >^ropriate 
gene regulation also ensures that cells function efficiently by expressing only those genes whose 
functions are required at a given time. 

Transcriptional regulatory proteins are essential for the control of gene expression. Some of 
these proteins function as transcription factors that initiate, activate, repress, or terminate gene 
transcription. Transcription factors generally bind to the promoter, enhancer, and upstream regulatory 
regions of a gene in a sequence-specific manner, although some factors bind regulatory elements 
within or downstream of a gene*s coding region. Transcription factors may bind to a specific region 
of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes 
IV, Oxford University Press. New York NY, and Cell Press, Cambridge MA, pp. 554-570.) 

The double helix structure and rqjeated sequences of DNA create topological and chemical 
features which can be recognized by transcription factors. These features are hydrogen bond donor 
and accqptor groups, hydrophobic patches, major and minor grooves, and regular, r^eated stretches 
of sequence which induce distinct bends in the helix. Typically, transcription factors recognize 
specific DNA sequence motifs of about 20 nucleotides in length. Multiple, adjacent transcription 
factor-binding motifs may be required for gene regulation. 

Many transcription factors incorporate DNA-binding structural motifs which comprise either 
a helices or fi sheets tiiat bind to the major groove of DNA. Four well-characterized structural motifs 
are helix-tum-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these 
motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA. 

The helix-tum-hdix motif consists of two a helices connected at a fixed angle by a short 
chain of amino acids. One of the helices binds to die major groove. Helix-tum-helix motifs are 
exemplified by the homeobox motif which is in'esent in homeodomain proteins. These proteins are 
critical for specifying the anterior-post^or body axis during development and are conserved 
Uiroughout die animal kingdom. The Antennapedia and Ultrabitiiorax proteins of Drosoohila 
melanoeastor are prototypical homeodomain proteins (Pabo, CO. and R.T. Sauer (1992) Annu. Rev. 
Biochem. 61:1053-1095). 

The zinc finger motif, which binds zinc ions, generally contains tandem repeats of about 30 
amino adds consisting of periodically spaced cysteine and histidine residues. Examples of this 
sequence pattern, designated C2H2 and C3HC4 ("RING" i5nger), have been described (Lewin, supra) . 
Zinc finger proteins eadi contain an a helix and an antiparallel S sheet whose proximity and 
conformation are maintained by the zinc ion. Contact with DNA is made by the arginine prece ding 
die a helix and by Uie second, third, and sixth residues of the a helix. Variants of the zinc finger 
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motif include poorly defined cysteine-iich motifs which bind zinc or other metal ions. Hiese motifs 

may not contain hisddine residues and are genially nonrq)etitive. 

The leucine zipper motif comprises a stretch of amino acids rich in leucine which can form an 

amphipathic a helix. Tliis structure provides the basis for dimerization of two leucine zipper 
5 proteins. The region adjacent to the leucine zipper is usually basic, and upon protein dimerization, is 

optimally positioned for binding to the major groove. Proteins containing such motifs are generally 

referred to as bZIP transcription factors. 

The helix-loop-helix motif (HLH) consists of a short a helix connected by a loop to a longer 

a helix. The loop is flexible and allows the two helices to fold back against each oth^ and to bind to 
10 DNA. The transcription factor Myc contains a prototypical HLH motif. 

Most transcription factors contain characteristic DNA binding motifs, and variations on the 

above motifs and new motifs have been and are currently being charact^zed (Faisst S. and S. Meyer 

(1992) Nucleic Acids Res. 20:3-26). 

Many neoplastic disorders in humans can be attributed to inappropriate gene expression. 
1 5 Malignant cell growth may result from either excessive expression of tumor promoting genes or 

insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104). 

Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one 

gene with the regulatory regions of a second unrdated gene. Such an arrangement likely results in 

inappropriate gene transcription, potentially contributing to malignancy. 
20 In addition, the immune system responds to infection or trauma by activating a cascade of 

events that coordinate the progressive selection, amplification^ and mobilization of cellular defense 

mechanisms. A complex and balanced program of gene activation and rqsression is involved in this 

process. However^ hyperactivity of the immune system as a result of improper or insufficient 

regulation of gene expression may result in considerable tissue or organ damage. This damage is well 
25 documented in immunological responses associated with arthritis, allergens, heart attack, stroke, and 

infections (Isselbacher, K.J. et al. (1996) Harrison's Pri nciples of Internal Medicine . 13/e. McGraw 

HUl, Inc. and Teton Data Systems Software). 

Furthermore, the geno-ation of multicellular organisms is based upon the induction and 

coordination of cell differentiation at the appropriate stages of development. Central to this process is 
30 differential gene expression, which confers the distinct identities of cells and tissues throughout the 

body. Failure to regulate gene expression during development can result in developmental disorders. 

Human developmental disorders caused by mutations in zinc finger-type transcriptional regulators 

include: urogenenital developmental abnormalities associated with WTl; Greig 

cephalopolysyndactyly, Pallista--Hall syndrome, and postaxial Polydactyly type A (GLI3); and 
35 Townes-Brocks syndrome, characterized by anal, renal, limb, and ear abnormalities (SALLl) 
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(Engelkamp. D. and V. van Heyningen (1996) Cuir. Opin. Genet Dev. 6:334-342; Kohlhase, J. et al. 
(1999) Am. J. Hum. Genet 64:435^5). 

CeU Membrane Molecules 
5 SEQ ID NO:28 and SEQ ID NO:29 encode, for example, cell membrane molecules. 

Eukaryotic cells are surrounded by plasma membranes which enclose the cell and maintain an 
environment inside the cell that is distina from its surroundings. In addition, eukaryotic organisms 
are distinct from prokaryotes in possessing many intracellular organelle and vesicle structures. Many 
of the metabolic reactions which distinguish eukaryotic biochemistry from prokaryotic biochemistry 

10 take place within these structures. The plasma membrane and the membranes surrounding organelles 
and vesicles are composed of phosphoglycerides, fatty acids, cholesterol, phospholipids, glycolipids, 
proteoglycans, and proteins. These components confer identity and functionality to the membranes 
with which they associate. 
Integral Membrane Protdns 

1 5 The majority of known integral membrane proteins are transmembrane proteins (TM) which 

are characterized by an extracellular, a transmeml^ane, and an intracellular domain. TM domains are 
typically comprised of 15 to 25 hydrophobic amino adds which are predicted to adopt an a-hdical 
conformation. TM proteins are classified as bitopic (Types I and II) and polytopic (Types in and IV) 
(Singer, S.J. (1990) Annu. Rev. Cell Biol. 6:247-296). Bitopic proteins span the membrane once 

2 0 while polytopic proteins contain multiple membrane-spanning segments. TM proteins function as 

cell-surface receptors, recq)tor-interacting proteins, transports of ions or metabolites, ion channels, 
cell anchoring proteins, and cell type-specific surface antigens. 

Many membrane proteins (MPs) contain amino acid sequence motifs that target these proteins 
to specific subcellular sites. Examples of these motifs include PDZ domains. KDEL, RGD. NGR, 

25 and GSL sequence motifs, von Willebrand factor A ( vWFA) domains, and EGF-like domains. RGD, 
NGR, and GSL motif-contaming peptides have been used as drug delivery agents in targeted cancer 
treatment of tumor vasculature (Arap, W. et al. (1998) Science 279:377-380). Furthermore, MPs may 
also contain amino acid sequence motifs, such as the carbohydrate recognition domain (CRD), that 
mediate interactions with extracellular or intracellular molecules. 

30 G-Protein Coupled Receptors 

G-protein coupled receptors (GPCR) are a superfamily of integral membrane proteins which 
transduce extracellular signals. GPCRs include recq)tors for biogenic amines, lipid mediators of 
inflammation, peptide hormones, and sensory signal mediators. The structure of these 
highly-conserved recq)tors consists of seven hydrophobic transmembrane regions, an extracellular 

3 5 N-terminus, and a cytoplasmic C-terminus. Three extracellular loops alternate with three intracellular 
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loops to link the seven transm^nbrane regions. Cysteine disulfide bridges connect the second and 
third extracellular loops. The most conserved regions of GPCRs are the transmembrane regions and 
the first two cytoplasmic loops. A conserved, acidic-Arg-aromatic residue triplet present in the 
second cytoplasmic loop may int^act with G proteins. A GPCR consensus patt^ is charactmstic of 
5 most proteins belonging to this superfamily (ExPASy PROSITE document PS00237; and Watson, S. 
and S. Arkinstall (1994) The G-protein Linked Receptor Facts Boole Academic Press, San Diego CA, 
pp. 2-6). Mutations and changes in transcriptional activation of GPCR-encoding genes have been 
associated with neurological disorders such as schizophrenia* Parkinson's disease, Alzheimer's 
disease, drug addiction, and feeding disorders. 

10 Scavenger Receptors 

Macrophage scavenger receptors with broad ligand specificity may participate in the binding 
of low density lipoproteins (LDL) and foreign antigens. Scavenger receptors types I and II are 
trimeric membrane proteins with each subunit containing a small N-terminal intracellular domain, a 
transmembrane domain, a large extracellular domain, and a C-terminal cystdne-ridi domain. Hie 

1 5 extracellular domain contains a short spac^ region, an a-helical coiled-coil region, and a triple helical 
coUagen-like region. These recq)tors have been shown to bind a spectrum of ligands, including 
chemically modified lipqxroteins and albumin, polyribonucleotides, polysacji^arides, phospholipids, 
and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sd. USA 87:9133-9137; andElomaa, O. 
et al. (1995) Cdl 80:603-609)! Hie scavenger recq^tors are thought to play a key role in 

2 0 atherogenesis by mediating uptake of modified LDL in arterial walls, and in host defense by binding 
bacterial endotoxins, bacteria, and protozoa. 
Tetraspan Familv Proteins 

The transmembrane 4 superfamily (TM4SF) or tetraspan family is a multigene family 
encoding type III integral membrane proteins (Wright. M.D. and M.G. Tomlinson (1994) Immunol. 

2 5 Today 1 5 :5 88-594). The TM4SF is comprised of membrane proteins which traverse the cell 
membrane four times. Members of the TM4SF include platelet and endothelial cell membrane 
proteins, melanoma-associated antigens, leukocyte surface glycoproteins, colonal carcinoma antigens, 
tumor-associated antigens, and surface proteins of the schistosome parasites (Jankowski, S.A. (1994) 
Oncogene 9:1 205-1 21 1). Members of the TM4SF share about 25-30% amino acid sequence identity 

30 with one another. 

A number of TM4SF members have been implicated in signal transduction, conu-ol of cell 
adhesion, regulation of cell growth and proliferation, including development and oncogenesis, and 
cell motility, including tumor cell metastasis. Expression of TM4SF proteins is associated with a 
variety of tumors and the level of expression may be altered when cells are growing or acdvated. 

35 Tiimor Antigens 
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Tumor antigens are cell surface molecules that are differentially expressed in tumor cells 
relative to normal cells. Tumor antigens distinguish tumor cells immunologically from normal cells 
and provide diagnostic and therapeutic targets for human cancers (Takagi, S. et al. (1995) InL J. 
Cancer 61:706-715; Uu, E. et al. (1992) Oncogene 7:1027-1032). 
5 Leukocyte Antigens 

Other types of cell surface antigens include those identified on leukocytic cells of the immune 
system. These antigens have been identified using systematic, monoclonal antibody (mAb)-based 
"shot gun" techniques. These techniques have resulted in the production of hundreds of mAbs 
directed against unknown cell surface leukocytic antigens. Hiese antigens have been grouped into 

10 "clusters of diffo-entiation" based on common immunocytochemical localization patterns in various 
diffiwentiated and undifferentiated leukocytic cell types. Antigens in a given cluster are presumed to 
identify a single cell surface protein and are assigned a "cluster of differentiation" or "CD" 
designation. Some of the genes encoding proteins identified by CD antigens have been cloned and 
verified by standard molecular biology techniques. CD antigens have been charact^zed as both 

1 5 transmembrane proteins and cell surface proteins anchored to the plasma membrane via covalent 
attachment to fauy add-containing glycolipids such as glycosylphosphatidylinositol (GPI). 
(Reviewed in Barclay, A.N. et al. (1995) The Leucocvte Antigen Facts Book. Academic Press, San 
Diego pp. 17-20.) 
Ion Channels 

20 Ion channels are found in the plasma membranes of virtually every cell in the body. For 

exdiaple, chloride channels mediate a variety of cellular functions including regulation of membrane 
potentials and absorption and secretion of ions across q)ithelial membranes. Chloride channels also 
regulate the pH of organelles such as die Golgi spparatus and endosomes (see, e.g., Greger, R. (1988) 
Annu. Rev. Physiol. 50:1 1 1-122). Electrophysiological and pharmacological prop^es of chloride 

2 5 channels, including ion conductance, cuirent-voltage relationships, and sensitivity to modulators, 
suggest tiiat different chloride channds exist in muscles, neurons, fibroblasts, epitiielial cells, and 
lymphocytes. 

Many ion channels have sites for phosphorylation by one or more protein kinases including 
protein kinase A, protein kinase C, tyrosine kinase, and casein kinase II, all of which regulate ion 
30 channel activity in cells. Inappropriate phosphorylation of proteins in cells has been linked to 

changes in cell cycle progression and cell differentiation. Changes in the cell cycle have been linked 
to induction of apoptosis or cancer. Changes in cell differentiation have been linked to diseases and 
disord^s of the r^roductive system, inunune system, skeletal muscle, and other organ systems. 
Proton Pumps 

35 Proton ATPases comprise a large class of membrane proteins tiiat use the energy of ATP 
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hydrolysis to generate an electrochemical proton gradient across a membrane. Hie resultant gradient 
may be used to transport other ions across the membrane (Na^, K^, or Q ) or to maintain organelle 
pH. Proton ATPases are further subdivided into the mitochondrial F-ATPases, the plasma membrane 
ATPases, and the vacuolar ATPases. The vacuolar ATPases establish and maintain an acidic pH 
5 within various organelles involved in the processes of endocytosis and exocytosis (Mellman, L et al. 
(1986) Annu. Rev. Biochem. 55:663-700). 

Proton-coupled, 1 2 membrane-spanning domain transporters such as PEPT 1 and PEPT 2 are 
responsible for gastrointestinal absorption and for renal reabsorption of peptides using an 
electrochemical IV gradient as the driving force. Another type of peptide transporter, the TAP 

10 transporter, is a heterodima" consisting of TAP 1 and TAP 2 and is associated with antigen 

processing. Peptide antigens are transported across the membrane of the endoplasmic reticulum by 
TAP so they can be expressed on the cell surface in association with MHC molecules. Each TAP 
protein consists of multiple hydrophobic membrane spanning segments and a highly conserved 
ATP-binding cassette (Boll, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:284-289). Pathogenic 

1 5 microorganisms, such as hopes 5inq)lex virus, may encode inhibitors of TAP-mediated pq)tide 
transport in order to evade inunune surveillance (Marusina, K. and J.J Manaco (1996) Curr. Opin. 
Hematol. 3:19-26). 
ABC Transporters 

Hie ATP-binding cassette (ABC) transporters, also called the "traffic ATPases", comprise a 

2 0 supeif amily of membrane protdns that mediate transport and channd functions in prokaryotes and 
eukaryotes (Higgins, C.F. (1992) Annu. Rev. Cdl Biol. 8:67-113). ABC proteins share a similar 
overall structure and significant sequence homology. All ABC protdns contain a cons^ed domain 
of approximately two hundred amino acid residues which includes one or more nucleotide binding 
domains. Mutations in ABC transporter genes are assodated with various disordm, such as 

25 hyp^bilinibinemia II/Dubin-Johnson syndrome, recessive Stargardt's disease, X-linked 
adrenoleukodystrophy, multidrug resistance, celiac disease, and cystic fibrosis. 
PeriphCTal and Anchored Membrane Proteins 

Some membrane proteins are not membrane-spanning but are attached to the plasma 
membrane via membrane anchors or interactions with integral membrane proteins. Membrane 

30 anchors are covalently joined to a protdn post-translationally and include such moieties as prenyl, 
myristyl, and glycosylphosphatidyl inositol groups. Membrane localization of peripheral and 
anchored proteins is important for their function in processes such as receptor-mediated signal 
transduction. For example, prenylation of Ras is required for its localization to the plasma membrane 
and for its normal and oncogenic functions in signal transduction. 

35 Vesicle Coat Proteins 
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Interoellular commimication is essential for the development and survival of multicellular 
organisms. Cells communicate with one another through the secretion and uptake of protein 
signaling molecules. The uptake of proteins into the cell is achieved by the endocytic pathway, in 
whidi the interaction of extracellular signaling molecules with plasma membrane receptors results in 
5 the formation of plasma membrane-denved vesicles that enclose and transport the molecules into the 
cytosol. These transport vesicles fuse with and mature into endosomal and lysosomal (digestive) 
compartments. The secretion of proteins from the cell is achieved by exocytosis, in which molecules 
inside of the cell proceed through the secretory pathway, in this pathway, molecules transit from the 
ER to the Golgi apparatus and finally to the plasma membrane, wh^e they are secreted from the cell. 

10 Several steps in the transit of material along the secretory and endocytic pathways require the 

formation of transport vesicles. Specifically, vesicles form at the transitional endoplasmic reticulum 
(t£R), the rim of Golgi cistemae, the face of the Trans-Golgi Network (TGN), the plasma membrane 
(PM), and tubular extensions of the endosomes. Vesicle formation occurs when a region of 
membrane buds off from the donor organelle. The membrane-bound vesicle contains proteins to be 

15 transported and is surrounded by a proteinaceous coat, the components of which are recruited from 
the cytosol. Two different classes of coat protdn have been identified. Clathrin coats form on 
vesicles dmved from the TGN and PM, whereas coatomer (COP) coats form on vesicles doived 
from the ER and Golgi. COP coats can be fuither classified as COPI, involved in retrograde traffic 
through the Golgi and from the Golgi to the ER, and COPII, involved in anterograde traffic from the 

20 ER to the Golgi (Mellman, supra^ . 

In clatbrin-based vesicle formation, adq)ter proteins bring vesicle cargo and coat proteins 
together at the surface of the budding membrane. Ad^^to^ protein- 1 and -2 select cargo from the 
TGN and plasma membrane, respectively, based on molecular information encoded on the 
cytoplasmic tail of integral membrane cargo proteins. Adapter proteins also recruit clathrin to the bud 

25 site. Clathrin is a protein complex consisting of three large and three small polypeptide chains 
arranged in a three-legged structure called a triskelion. Multiple triskelions and other coat proteins 
appear to self-assemble on the membrane to form a coated pit. This assembly process may sCTve to 
deform the membrane into a budding vesicle. GTP-bound ADP-ribosylation factor (ArO is also 
incorporated into the coated assembly. Another small G-protein, dynamin, forms a ring complex 

30 around the neck of the forming vesicle and may provide the mechanochemical force to seal the bud, 
thereby releasing the vesicle. The coated vesicle complex is then transported through the cytosol. 
During the transport process, Arf-bound GTP is hydrolyzed to GDP, and the coat dissociates from the 
transport vesicle (West, M.A. et al. (1997) J. Cell Biol. 138:1239-1254). 

Vesicles which bud from the ER and the Golgi are covered with a protein coat similar to the 

3 5 clathrin coat of endocytic and TGN vesicles. The coat protein (COP) is assembled from cytosolic 
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precursor molecules at spedfic budding regions on the organelle. The COP coat consists of two 
major components, a G-protein ( Arf or Sar) and coat protomer (coatom^). Coatomer is an equimolar 
complex of seven proteins, termed alpha-, beta-, beta*-, gamma-, delta-. q)silon- and zeta-COP. The 
coatomer complex binds to dilysine motifs contained on the cytoplasmic tails of integral membrane 
5 proteins. These include the KKXX retrieval motif of membrane proteins of the ER and 

dibasic/diphenylamine motifs of members of the p24 family. The p24 family of type 1 membrane 
proteins represent the major membrane proteins of COPI vesicles (Harter, C. and F.T. Wieland (1998) 
Proc. Natl. Acad. Sci. USA 95:1 1649- 11 654). 

10 Organelle Associated Molecules 

SEQ ID NO:44. SEQ ID NO:45, and SEQ ID NO:46 encode, for example, organelle 
associated molecules. 

Eukaryotic cells are organized into various cellular organelles which has the effect of 
sq)arating specific molecules and their functions from one another and from the cytosol. Within the 

1 5 cell, various membrane structures surround and define these organelles while allowing them to 
interact with one another and the cell environment through both active and passive nransport 
processes. Important cell organelles include the nucleus, the Golgi apparatus, the endoplasmic 
reticulum, mitochondria, peroxisomes, lysosomes, endosomes. and secretory vesicles. 
Nucleus 

20 The cell nucleus contains all of the genetic information of the cell in the form of DNA, and 

the components and machine necessary for replication of DNA and for transcription of DNA into 
RNA, (See Alberts. B. et al. (1994) Molecular Biology of the Cell . Garland Publishing Inc.. New 
York NY. pp. 335-399.) DNA is organized into compact structures in the nucleus by interactions 
with various DNA-binding proteins such as histones and non-histone chromosomal proteins. 

2 5 DN A-spedfic nucleases, DNAses, partially degrade these compacted structures prior to DNA 

rq)lication or transcription. DNA replication takes place with the aid of DNA helicases which 
unwind the double-stranded DNA hdix, and DNA polym^ases that duplicate the separated DNA 
strands. 

Transcriptional regulatory proteins are essential for the control of gene expression. Some of 
30 these proteins function as transcription factors that initiate, activate, repress, or terminate gene 
transcription. Transcription factors generally bind to the promoter, enhancer, and upstream 
regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory 
elements within or downstream of a gene's coding region. Transcription faaors may bind to a specific 
region of DNA singly or as a complex with other accessory faaors. (Reviewed in Lewin, B. (1990) 

3 5 Genes IV . Oxford University Press, New York NY, and Cell Press. Cambridge MA, pp. 554-570.) 
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Many transcription factors incorporate DNA-binding structural motifs which comprise eith^ a 
helices or B sheets that bind to the major groove of DNA. Four well-characterized structural motifs 
are helix-tum-helix, zinc finger, leucine zipper, and helix-loop-helix. Proteins containing these 
motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA. 
5 Many neoplastic disorders in humans can be attributed to inappropriate gene expression. 

Malignant cell growth may result from either excessive expression of tumor promoting genes or 
insufficient expression of tumor suppressor genes (Cleary, ML, (1992) Canc^ Surv. 15:89-104). 
Chromosomal translocations may also produce chimo-ic loci which fuse the coding sequence of one 
gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in 

10 inq)propriate gene transcription, potentially contributing to malignancy. 

In addition, the immune system responds to infection or trauma by activating a cascade of 
events that coordinate the progressive selection, amplification, and mobilization of cellular defense 
mechanisms. A complex and balanced program of gene activation and rq)ression is involved in this 
process. However, hyperactivity of the immune system as a result of improper or insufficient 

15 regulation of gene expression may result in considerable tissue or organ damage. This damage is well 
documented in immunological responses associated with arthritis, allo-gens, heart attack, stroke, and 
infections (Isselbacher, K.J. et al. (1996) Harrison's ftincioles of Internal Medicine . 13/e, McGraw 
Hill, Inc. and Teton Data Systems Software). 

lYanscription of DNA into RNA also takes place in the nucleus catalyzed by RN A 

20 polymerases. Hiree types of RNA polymerase exist RNA polymerase I makes large ribosomal 
RNAs, while RNA polymerase HI makes a variety of small, stable RNAs including 5S ribosomal 
RNA and the transfer RNAs (tRNA). RNA polymerase II transcribes genes that will be translated 
into proteins. The primary transcript of RNA polymerase n is called heterogenous nuclear RNA 
(hnRNA). and must be further processed by splicing to remove non-coding sequences called introns. 

25 RNA splicing is mediated by small nuclear ribonucleoprotein complexes, or snRNPs, producing 
mature messenger RNA (mRNA) which is then transported out of the nucleus for translation into 
proteins. 

Nucleolus " 

The nucleolus is a highly organized subcomparunent in the nucleus that contains high 
3 0 concentrations of RNA and proteins and functions mainly in ribosomal RNA synthesis and assembly 
(Alberts, et al. supra , pp. 379-382). Ribosomal RNA (rRNA) is a structural RNA that is complexed 
with proteins to form ribonucleoprotein smictures called ribosomes. Ribosomes provide the platform 
on which protein synthesis takes place. 

Ribosomes are assembled in the nucleolus initially from a large, 45S rRNA combined with a 
3 5 variety of proteins imported from the cytoplasm, as well as smaller, 5S rRNAs. Lata* processing of 
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the immature ribosome results in formation of smaller ribosomal subunits which are transported from 
the nucleolus to the cytoplasm where they are assembled into functional ribosomes. 
Endoplasmic Reticulum 

In eukaryotes, protdns are synthesized within the endoplasmic r^culum (ER), delivered from 
5 the ER to the Golgi apparatus for post-translational processing and sorting, and transported from the 
Golgi to specific intracellular and extracellular destinations. Synthesis of integral m^rane proteins* 
secreted protdns, and proteins destined for the lum^ of a particular organelle occurs on the rough 
endoplasmic reticulum (ER). The rough ER is so named because of the rough appearance in electron 
micrographs imparted by the attached ribosomes on which protein synthesis proceeds. Synthesis of 

10 proteins destined for the ER actually begins in the cytosol with the synthesis of a specific signal 
pqjtide which directs the growing polypeptide and its attached ribosome to the ER membrane where 
the signal peptide is removed and protein synthesis is completed. Soluble proteins destined for the 
ER lumen, for seaetion. or for transport to the lumen of other organelles pass completely into the ER 
lumen. Transmembrane protdns destined for the ER or for other cell membranes are translocated 

1 5 across the ER membrane but remain anchored in the lipid bilayer of the membrane by one or more 
membrane-spanning a-helical regions. 

Translocated polypeptide chains destined for other organelles or for seaetion also fold and 
assemble in the ER lumen with the aid of certain "resident" ER proteins. Protein folding in the ER is 
aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and pqitidyl- 

20 prolyl isomerase ^PI). PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to 
fonn intramolecular disulfide bonds in proteins. PPI, an enzyme that catalyzes the isomerization of 
certain proline imide bonds in oligopq)tides and proteins, is considered to govern one of the rate 
limiting steps in the folding of many proteins to their final functional conformation. Iht cydophilins 
represent a major class of PPI that was originally identified as the major receptor for the 

2 5 unmunosuppressive drug cyclosporin A (Handschumacher, R.E. et al. (19S4) Science 226:544-547). 

Molecular "chaperones" such as BiP (binding protein) in the ER recognize incorrectly folded proteins 
as well as proteins not yet folded into their final form and bind to them, both to prevent improper 
aggregation between them, and to promote proper folding. 

The "N-linked" glycosylation of most soluble secreted and membrane-bound proteins by 
30 oligosacchrides linked to asparagine residues in proteins is also performed in the ER. This reaction is 
catalyzed by a membrane-bound enzyme, oligosaccharyl transferase. 
Golgi Apparatus 

The Golgi apparatus is a complex struaure that lies adjacent to the ER in eukaryotic cells and 
s^es primarily as a sorting and dispatching station for products of the ER (Alberts, et al. supra , pp. 

3 5 600-610). Additional posttranslational processing, principally additional glycosylation, also occurs in 
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the Golgi. Indeed, the Colgi is a major site of cartx)hydrate synthesis, including most of the 
glycosaminoglycans of the extracellular matrix. N-linked oligosaccharides, added to protdns in the 
ER, are also further modified in the Golgi by the addition of more sugar residues to form complex N- 
linked oligosaccharides. "0-linked" glycosylation of proteins also occurs in the Golgi by the 
addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue followed by 
the sequential addition of other sugar residues to the first This process is catalyzed by a series of 
glycosyltransferases each specific for a particular donor sugar nucleotide and acceptor molecule 
(Lodish, H. el al. (1995) Molecular Cell Biology . W.H. Freeman and Co.. New York NY, pp.700- 
708). In many cases, both N* and 0-Unked oligosaccharides appear to be required for the seaetion of 
proteins or the movement of plasma membrane glycoproteins to the cell surface. 

The terminal compartment of the Golgi is the Trans-Golgi Network (TGN), where both 
monbrane and lumenal proteins are sorted for their fmal destinatioa Transport (or secretory) vesicles 
destined for intracdlular compartments, such as lysosomes, bud off of the TGN. Other transport 
vesicles bud off contaming proteins destined for the plasma membrane, such as receptors, adhesion 
molecules, and ion channels, and secretory proteins, such as hormones, nwotransmitters, and digestive 
enzymes. 
Vacuoles 

The vacude system is a collection of membrane bound compartments in eukaryotic cells that 
functions m the processes of oidocytosis and exocytosis. They include phagosomes, lysosomes, 
endosomes, and secretory vesicles. Endocytosis is the process in cdls of internalizing nutrioits, solutes 
or small particles (pinocytosis) or large particles such as internalized recq)tors, viruses, bacteria, or 
bacteria] toxins (phagocytosis). Exocytosis is the process of ttansporting molecules to tiie cdl surface. 
It facilitates placem^ or localization of m»nbran&-bound recq)tors or other membrane proteins and 
secretion of hormones, neurotransmitters, digestive oizymes, wastes, 

A common prq)erty of all of these vacuoles is an acidic pH environment ranging fi'om 
approximately pH 4.S-5.0. This acidity is maintained by the presence of a proton ATPase that uses the 
Goexgy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane (Mdlman, 
I. et al. (1986) Aram. Rev. Biochem. 55:663-700). Eukaryotic vacuolar proton ATPase (vp-ATPase) is 
a multimeric ^ynie composed of 3-10 different subunits. One of these subunits is a highly 
hydrq}h6bic polypeptide of approximately 16 kDa tiiat is similar to tiie proteolipid conqx)nent of vp- 
ATPases from eubacteria, fiingi. and plant vacuoles (Mandel, M. et al. (1988) Proc. Nati. Acad. Sci. 
USA 85 :5S21 -5524). The 1 6 kDa proteolipid component is tiie major subunit of tiie membrane portion 
of vp-ATPase and functions in tiie transport of protons across tiie mwnbrane. 
Lvsosomes 

Lysosomes are membranous vesicles containing various hydrolytic enzymes used for tiie 
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controlled intracellular digestion of macromolecules. Lysosomes contain some 40 types of Byrnes 
including proteases, nucleases, glycosidases, lipases, phospholipases, phosphatases, and sulfatases, all 
of which are add hydrolases that function at a pH of about 3. Lysosomes are surrounded by a unique 
membrane containing transport proteins that allow the Onal products of macromolecule degradation, 
5 such as sugars, amino acids, and nucleotides, to be transported to the cytosol where they may be 
either excreted or reutilized by the cell. A vp-ATPase, such as that described above, maintains the 
acidic environment necessary for hydrolytic activity (Alberts, supra, pp. 610-61 1). 
Endosomes 

Endosomes are another type of acidic vacuole that is used to transport substances from the 
1 0 cell surface to the interior of the cell in the process of endocytosis. Like lysosomes, endosomes have 
an acidic environment provided by a vp-ATPase (Alberts et al. supra, pp. 610-618). Two types of 
endosomes are apparent based on tracer uptake studies that distinguish their time of formation in the 
cell and their cellular location. Early endosomes are found near the plasma membrane and appear to 
function primarily in the recycling of internalized recq)tors back to the cell surface. Late endosomes 
15 ^pear later in the endocytic process close to tiie Golgi apparatus and tiie nucleus, and appear to be 
associated with ddiv^ of endocytosed material to lysosomes or to the TGN where they may be 
recycled Specific prc^eins are associated with particular transport vesicles and tiieir target 
compartments that may provide selectivity in targ^g vesicles to their proper coiiq)artm^. A 
cytosdiic prenylated GTP-binding protein, Rab, is one such protein. Rabs 4, 5, and 1 1 are associated 
20 with the early aidosome, whereas Rabs 7 and 9 associate with the late endosome. 
Mitochondria 

Mitochondria are oval-shq)ed organdies comprising an ovxa membrane, a tightiy folded 
inner membrane, an int^membrane space between the outer and inner m^branes, and a matrix 
inside the inner membrane. Hie outer membrane contains many porin molecules that allow ions and 

2 5 diarged molecules to enter the intermmbrane space, while the inner membrane contains a variety of 

transport proteins that transfer only selected molecules. Mitochondria are the primary sites of energy 
production in cells. 

Energy is produced by tiie oxidation of glucose and fatty acids. Glucose is initially converted 
to pyruvate in the cytoplasm. Fatty acids and pyruvate are transported to the mitochondria for 

3 0 complete oxidation to COj coupled by enzymes to the transport of electrons from N ADH and FADH^ 

to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and Pj. 

Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via 
the citric add cycle, involving pyruvate dehydrogenase components, dihydrolipoyl transacetylase, and 
dihydrolipoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, 
3 5 aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including 
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transsucdnylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate 
dehydrogenase. Acetyl CoA is oxidized to COj with concomitant formation of NADH, FADHj, and 
GTP. In oxidative phosphorylation, the transfer of electrons from NADH and FADHj to oxygen by 
dehydrogenases is coupled to the synthesis of ATP from ADP and Pj by the FqF, ATPase complex in 
5 the mitochondrial inno* membrane. Enzyme complexes responsible for electron transport and ATP 
synthesis include the FJF^ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone 
reductase, cytochrome b, cytochrome c,, FeS protein, and cytochrome c oxidase. 
Peroxisomes 

Peroxisomes, like mitochondria, are a major site of oxygen utilization. They contain one or 
1 0 more enzymes, such as catalase and urate oxidase, that use molecular oxygen to remove hydrogen 
atoms from specific organic substrates in an oxidative reaction that produces hydrogen peroxide 
(Alberts, supra , pp. 574-577). Catalase oxidizes a variety of substrates including phenols, formic 
acid, fonnaldehyde, and alcohol and is important in peroxisomes of liver and kidney cells for 
detoxifying various toxic molecules that enter the bloodstream. Another major function of oxidative 
1 5 reactions in pooxisomes is the breakdown of fatty adds in a process called P oxidation, p oxidation 
results in shortening of the alkyl chain of fatty acids by blocks of two carbon atoms that are converted 
to acetyl CoA and exported to the cytosol for reuse in biosynthetic reactions. 

Also like mitochondria, p^oxisomes import their proteins from the cytosol using a specific 
signal sequence located near the C-terminus of the protein. The importance of this import process is 

2 0 evident in the inh^ted human disease Zellweger syndrome, in which a defea in importing proteins 

into perixosomes leads to a perixosomal deficiency resulting in severe abnormalities in the brain, 
liver, and kidneys, and death soon after birth. One form of this disease has been shown to be due to a 
mutation in the gene encoding a perixosomal integral membrane protein called peroxisome assembly 
factor-1. 

25 The discovery of new human molecules for diagnostics and therapeutics satisfies a need in the 

art by providing new conq)ositions which are usdul in the diagnosis, study, prevention, and treatment 
of diseases associated with human molecules. 

SUMMARY OF THE INVENTION 

3 0 The present invention relates to nucleic acid sequences comprising human diagnostic and 

therapeutic polynucleotides (dithp) as presented in the Sequence Listing. Some of the dithp uniqudy 
identify genes encoding human structural, functional, and regulatory molecules. 

The invention provides an isolated polynucleotide coiiq)rising a polynucleotide sequence 
sdected from the group consisting of a) a polynucleotide sequ^ice selected from the group consisting of 
35 S£Q ID NO:1-52; b) a naturally occurring polynucleotide sequence having at least 90% sequence 
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identity to a pdyimcleotide sequence selected from the group consisting of SEQ ID NO:l-52; c) a 
polynucleotide sequence conq>lenientary to a); d) a polynucleotide sequence conq)lementary to b); and 
e) an RNA equivalent of a) ttarougb (Q. In one alternative, the polynucleotide conq)rises a 
polynucleotide sequm^ selected from the group consisting of SEQ ID NO: 1 -S2. In another alternative, 
the polynucleotide conq)rises at least 60 contiguous nucleotides of a polynucleotide sequence selected 
from the group consisting of a) a polynucleotide sequence sdected from the group consisting of SEQ ID 
NO:l-52; b) a naturally occurring polynucleotide sequence having at least 90% sequence id^tity to a 
polynucleotide sequence selected from the group consisting of SEQ ID NO:l-52; c) a polynucleotide 
sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA 
equivalent of a) through d). The invention ftirther provides a con^x)sition fcff the detection of 
expression of human diagnostic and therapeutic polynucleotides* comprising at least one isolated 
polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a 
polynucleotide sequence sdected from the group consisting of SEQ ID NO: 1-52; b) a naturally 
occuiTing polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequ^ce 
selected from the group consisting of SEQ ID NO: 1-52; c) a polynucleotide sequence complementary to 
a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d); and a 
detectable labd. 

The invention also provides a method for detecting a target polynucleotide in a sample, said 
target polynucleotide comprising a polynucleotide sequence sdected from the group consisting of a) a 
polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-52; b) a naturally 
occurring polynucleotide sequence having at least 90% sequrace identity to a polynucleotide sequence 
sdected from the group consisting of SEQ ID NO:l-52; c) a polynucleotide sequence complementary to 
a); d) a polynucleotide sequence cotnpl^nentary to b); and e) an RNA equivalent of a) through d). The 
method conqHises a) hybridizing the sample with a probe conq)rising at least 20 contiguous nucleotides 
comprising a sequence complementary to said targ^ polynucleotide in the sample, and which probe 
specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex 
is formed b^een said probe and said target polynucleotide, and b) detecting the presence or absence of 
said hybridization conq)lex, and, optionally, if present, the amount thereof. In one alternative, the probe 
comprises at least 30 contiguous nucleotides. In another alternative, the probe comprises at least 60 
contiguous nucleotides. 

The invention further provides a recombinant polynucleotide comprising a promoter sequence 
operably linked to an isolated polynucleotide comprising a polynucleotide sequence selected from the 
group consisting of a) a polynucleotide sequence sdected from the group consisting of SEQ ID NO: 1 - 
52; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-52; c) a polynucleotide 
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sequence coniplemeiitary to a); d) a polynucleotide sequence complem^ttary to b); and e) an RNA 
equivalent of a) through d). In one alternative, the invention provides a cdl transfcvnied with the 
recombinant polynucleotide. In another alternative, the invention provides a transgenic organism 
conq)rising the recombinant polynucleotide^ In a further alternative, the invention provides a method 
for producing a human diagnostic and tho-apeutic polypq)tide, the m^hod conq)rising a) culturing a 
cell under conditions suitable for expression of the human diagnostic and ther^jeutic polypq)tide, 
wherein said cell is transformed with the recombinant polynucleotide, and b) recovering the human 
diagnostic and ther^)eutic polypeptide so expressed 

The invaition also provides a purified human diagnostic and therapaitic polypeptide (DITHP) 
^coded by at least one polynucleotide conqprising a polynucleotide sequ^ice selected from the group 
consisting of SEQ ID NO: 1-52. Additionally, the invention provides an isolated antibody which 
specifically binds to the human diagnostic and therap«itic poiypq)tide. The invention fTirths- provides 
a method of identifying a test compound which specifically binds to the human diagnostic and 
therapaitic polypeptide, the m^od comprising the stq>s of a) providing a test compound; b) combining 
the human diagnostic and therapaitic polypq)tide with the test compound for a sufficient time and 
under suitable conditions for binding; and c) detecting binding of the human diagnostic and therapeutic 
polypeptide to the test compound, thereby identifying the test compound which specifically binds the 
human diagnostic and therapeutic polypeptide. 

The invention further provides a microarray wherein at least one element of the miaoarray is 
an isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide comprising 
a polynucleotide sequence sdected from the group consisting of a) a polynucleotide sequence sdected 
from the group consisting of SEQ ID NO: 1-52; b) a naturally occurring polynucleotide sequence having 
at least 90% sequence identity to a polynucleotide sequence selected from the group consisting of SEQ 
ID NO:l-52; c) a polynucleotide sequence conqplementary to a); d) a polynucleotide sequence 
oonq^lementary to b); and e) an RNA equivalent of a) through d). The invention also provides a method 
for generating a transcript image of a sample whidi contains polynucleotides. The method conqprises a) 
labdiqg the polynupleotides of the sample, b) contacting the el^nents of the microarray with the labded 
polynucleotides of the sanq>le under conditions suitable for the formation of a hybridization compl&c, 
and c) quantifying the expression of the polynudeoddes in the sanqple. 

Additionally, the invention provides a m^hod for saeening a compound for effectiveness in 
altering expression of a target polynucleotide, wherein said target polynucleotide conqpnses a 
polynucleotide sequoice selected from the group consisting of a) a polynucleotide sequence selected 
from the group consisting of SEQ ID NO:l-52; b) a naturally occurring polynucleotide sequence having 
at least 90% sequence identity to a polynucleotide sequence sdected from the group consisting of SEQ 
ID NO: 1-52; c) a polynucleotide sequence complonoitary to a); d) a polynucleotide sequence 
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con5)lOTeiitary to b); and e) an RNA equivalent of a) through d). The method cxm^irises a) exposing a 
sample conq)rising the target polynucleotide to a compound, and b) detecting altered expression of the 
target polynucleotide. 

The inventicm fluther provides a m^hod fw detecting a target polynucleotide in a sample for 
toxicity testing of a compound, said target polynucleotide comprising a polynucleotide sequence 
selected from the group consisting of a) a polynucleotide sequence sdected from the group consisting of 
SEQ ID NO:l-52; b) a naturally occurring polynucleotide sequence having at least 90% sequaice 
identity to a polynucleotide sequence selected from the group consisting of SEQ ED NO:l-52; c) a 
polynucleotide sequence conq)lmentary to a); d) a polynucleotide sequence complonentary to b); and 
e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a probe 
comprising at least 20 contiguous nucleotides con^>rising a sequence complementary to said target 
polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, 
under conditions whereby a hybridization complex is formed between said probe and said target 
polynucleotide, b) detecting the presence or absaice of said hybridization complex, and. optionally, if 
present, the amount thereof, and c) comparing the presence, absence or amount of said target 
polynucleotide in a first biological sample and a second biological sample, wherein said first 
biological sample has been contacted with said compound, and said second sample is a control, 
whoreby a change in presence, absence or amount of said target polynucleotide in said first sample, as 
compared with said second sample, is indicative of toxic response to said compound, 

DESCRIPTION OF THE TABLES 
Table 1 shows the sequence identification numbers (SEQ ID NO:s) and template identification 
numbers (template IDs) corresponding to the polynucleotides of the present invention, along with thdr 
GenBank hits (GI Numbers), probability scores, and functional annotations cwresponding to the 
GenBankhits. 

Table 2 shows the sequence identification numbers (SEQ ID NO:s) and template identification 
numbers (traiplate IDs) corresponding to the polynucleotides of the present invention, along with 
polynucleotide segments of eadi template sequaice as defmed by the indicated "start" and "stop" 
nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, Pfam 
descriptions, and E-values corresponding to the polypeptide domains encoded by the polynucleotide 
segments are indicated. 

Table 3 shows the sequence identification numbers (SEQ ID NO:s) and tanplztt identification 
numbers (tenqjlate IDs) corresponding to the polynucleotides of the present invention, along with 
polynucleotide segments of each template sequence as defined by the indicated "start" and "step" 
nucleotide positions. The reading frames of the polynucleotide s^ments are shown, and the 
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polypeptides encoded by the polynucleotide segments constitute either signal pq>tide (SP) or 
transm^rane (TM) domains, as indicated. 

Table 4 shows the sequence identification numbers (SEQ ID NO:s) and tenq)late identification 
numbers (tenq)late IDs) corresponding to the polynucleotides of the present invention, along with 
conq)onent sequoice identification numbers (component IDs) corresponding to each teQq)late. The 
conpment sequences, which were used to assemble the template sequences, are defined by the indicated 
"start" and "stop" nucleotide positions along each t^late. 

Table 5 summarizes the bioinformatics tools which are useful for analysis of the 
polynucleotides of the present invention. The first colunui of Table 5 lists analytical tools, programs, 
and algorithms, the second cohunn provides l^ef descriptions thereof, the third column presents 
q)propriate references, all of which are incorporated by reference herein in their entirety, and the fourth 
colunm presents, where applicable, the scores, probability values, and ether paramos used to evaluate 
the strragth of a match between two sequences (the higher the score, the greater the homology between 
two sequ^ices). 

DETAILED DESCRIPTION OF THE INVENTION 

Before the nucleic acid sequences and methods are presented, it is to be understood that this 
invention is not limited to the particular machines, m^ods, and materials described Although 
particular onbodimaits are described, machines, methods, and materials similar or equivalent to these 
embodiments may be used to practice the inventioa The preferred machines, methods, and materials 
set forth are not intatded to limit the scope of the invention vMch is lintited only by the appended 
claims. 

The singular fcnms "a", "an", and 'the" include plural reference unless the context clearly 
dictates otherwise. All technical and scientific terms have the meanings commonly understood by one 
of ordinary skiU in the art. All publications are incwporated by reference for the purpose of describing 
and disclosing the cell lines, vectors, and methodologies which are presented and which might be used in 
connection with the inventioa Nothing in the specification is to be construed as an admission tliat the 
invention is not entitled to antedate such disclosure by virtue of prior invention. 

Definitions 

As used herein, the lower case "dithp" refers to a nucldc acid sequence, while the upper case 
"DITHF' refers to an amino add sequence encoded by dithp. A "ftilMoigth" dithp refers to a nucleic 
acid sequoice containing the entire coding region of a gene endogenously expressed in human tissue. 

"Adjuvants" are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and 
surface active substances Qysdedthin, pluronic polyols, polyanions, pq)tides, oil emulsions, keyhole 
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limpet hemocyanin, and dinitropbenol) which may be administered to increase a host's immimologica] 
response. 

"Allde" refers to an alternative form of a nucleic add sequence. Alldes result from a 
"mutation," a change or an alternative reading of the genetic code. Any given gene may have none, one, 
5 or many allelic forms. Mutations which give rise to alldes indude ddetions, additions, or substitutions 
of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more 
times in a given nuddc add sequence. The present invention encompasses alldic dittq). 

"Aroux) add sequence" refers to a pq)tide, a polypeptide, or a protdn of dther natural or 
synthetic origin. The amino add sequence is not limited to the con^lete, endogenous amino add 
10 sequoice and may be a fragment, qxitope, variant, or derivative of a protdn expressed by a nucldc acid 
sequence. 

''Amplification" rders to the production of ac&iitional cq}ies of a sequence and is carried out 
using polymerase diain reaction (PGR) teduiologies wdl known in the art 

"Antibody" refers to intact molecules as wdl as to fragments thereof, such as Fab, F(ab')2, and 
15 Fv fragments, whidi are capable of binding the q)itopic determinant. Antibodies that bind DITHP 
polypq)tides can be prepared using intact polypeptides or using fragments containing small peptides of 
into-est as the immunizing antigea The polypq)tide or pq)tide used to immunize an animal (e.g., a 
mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized chemically, and 
can be conjugated to a carrier proldn if desired Commonly used caniers that are chemically coupled 
20 to pq)tides include bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH). The 
coupled pq)tide is then used to immunize the animal. 

"Antisense sequence" refCTS to a sequence capable of spedfically hybridizing to a targ^ 
sequence. The antisense sequ^ce may include DNA, RNA, or any nucldc acid mimic or analog such 
as pq)tide nucldc acid (PNA); oligonucleotides having modified backbone linkages such as 
25 phosphorothioates, methylphosphonates, or t)enzylphosphonates; oligonucleotides having modified 
sugar groups such as 2'-methoxy^yl sugars or 2'-methoxyelhoxy sugars; or oligonucleotides having 
modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine. 

"AntisCTse sequence" refers to a sequence capable of specifically hybridizing to a target 
sequence. The antisense sequence can be DN A, RNA, or any nucldc add mimic or analog. 
3 0 "Antisense technology** refers to any technology which rdies on the specific hybridization of an 

antisense sequence to a target sequence. 

A "bin" is a portion of computer memory space used by a computer program for storage of 
data, and bounded in such a manner that dau stored in a bin may be reeved by the program. 
"Biologically active" refers to an amino add sequence having a stmaural, regulatory, or 
3 s biodiemical function of a naturally occurring amino add sequence. 
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"Gone, jdniiig'* is a process fat CQmbining gene bins based upon the bins* containing sequence 
information from the same clone. Tlie sequences may assemble into a primary g^e transcript as well 
as one or more splice variants. 

"Complementary" describes the relationship between two single-stranded nucleic add 
sequences that anneal by base-pairing (5 -A-G-T-3* pairs with its conq)lencnt 3 -T-C-A-S"). 

A "oonq>onent sequence** is a nucleic add sequence sdected by a conpiter program such as 
PHRED and used to assemble a consensus or teiiq>late sequence from one or noore conqx)nent 
sequences. 

A "consensus sequence** or 'template sequence" is a nucldc add sequence which has been 
assembled from overlapping sequoices, using a computer program for fragment assembly sudi as the 
GELVIEW fragmait assembly system (Genetics Con^)uter Group (GCG), Madison WI) or using a 
rdational database managonent system (RDMS). 

"Conservative amino add substitutions" are those substitutions that, when made, least interfere 
with the properties of the original protdn, i.e.. the structure and especially the llinction of the protdn is 
conserved and not significantly changed by such substitutions. ,The table bdow shows amino adds 
which may be substituted for an original amino add in a protdn and which are r^arded as conservative 
substitutions. 



Orisinal Residue 


Conservative Substitution 


Ala 


Gly. Ser 


Arg 


His, Lys 


Asn 


Asp, Gin, His 


Asp 


Asn, Glu 


Cys 


Ala, So- 


Gin 


Asn, Glu, His 


Glu 


Asp, Gbu His 


Gly 


Ala 


His 


Asn, Arg, Gin, Glu 


De 


Leu, Val 


Leu 


He, Val 


Lys 


Arg, Gin, Glu 


Met 


Leu, lie 


Phe 


His, Met. Leu, Trp. Tyr 


Ser 


Cys. Thr 


Thr 


Ser. Val 


Trp 


Phe, Tyr 


Tyr 


His, Phe, Trp 


Val 


ne. Leu, Thr 



Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in 



113 



wo 00/73509 PCT/US00/I5404 

the area of the substitution, for exaii5)le, as a btta she^ or alpha helical conformation, (b) the charge or 
hydrophobidty of the molecule at the target site, or (c) the bulk of the side chaia 

"Dd^on" refers to a change in cither a nuddc or amino add sequence in whidi at least one 
nucleotide or amino acid residue, respectivdy, is absent. 

"Derivative" refers to the chemical modification of a nucldc add sequence, such as by 
replacement of hydrogen by an alkyl, acyl, amino, hydroxy], or Otter group. 

"E-value" refers to the statistical probability that a match between two sequences occurred by 

chance. 

A "fragmait" is a unique potion of ditl^ or DITHP which is identical in sequence to but 
shorto- in length than the parent sequence. A fragment may comprise up to the entire laigth of the 
defined sequence* minus one nucleotide/amino acid residue. For example, a fragment may conqnise 
from 10 to 1 000 contiguous amino add residues or nucleotide. A fragment used as a probe, primer, 
anUgen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40. 50, 
60, 75, 100. 150, 250 or at least 500 contiguous amino add residues or nucleotides in length. 
Fragments may be preferentially sdected from certain regions of a molecule. For example, a 
polypeptide fragmeitt may comprise a certain length of contiguous amino adds selected from the first 
250 or 500 amino adds (or first 25% or 50%) of a polypeptide as shovm in a certain defined sequence. 
Clearly these lengths are exemplary, and any length that is supported by the spedficatioa including the 
Sequence Listing and the figures, may be encompassed by the present embodimwits. 

A fragment of dithp comprises a region of unique polynucleotide sequence that specifically 
id^fies dithp, for example, as distinct from any other sequoice in the same genome, A fragment of 
dithp is useful, for example, in hybridization and amplification technologies and in analogous methods 
that distinguish dithp from rdated polynudeotide sequences. The precise length of a fragment of dithp 
and the region of dithp to which the fragment corresponds are routinely determinable by one of ordinary 
skin in the art based on the intraded purpose for the fragment 

A fragment of DITHP is encoded by a fragment of ditl^. A fragment of DITHP comprises a 
region of unique amino add sequence that specifically identifies DITHP. For example, a fragment of 
DITHP is useful as an immunogenic pq)tide for the devdopment of antibodies that specificaUy 
recognize DITHP. The predse loigth of a fragment of DITHP and the region of DITHP to which the 
fragment corresponds are routindy determinable by one of ordinary skill in the art based on the intended 
purpose for the fragment. 

A "full Iraglli" nudeotide sequence is one containing at least a start site for translation to a 
protdn sequence, followed by an open reading frame and a stop site, and aicoding a "full length" 
polypeptide. ^ 

"Hit" refers to a sequence whose annotation will be used to describe a given template. Criteria 
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for selecting the top hit are as follows: if the teDq}late has one or more exact nuddc add matches, the 
top hit is the exact match with highest percent identity. If the tenq}late has no exaa matches but has 
significant protdn hits, the top hit is the protein hit with the lowest E-value. If the template has no 
significant protdn hits, but does have significant non-exact nucleotide hits, the top hit is the nudeotide 
5 hit with the lowest £- value. 

"Homology" refers to sequence similarity dther between a reference nucldc add sequence and 
at least a fragment of a (Mvp or b^eai a reference amino acid sequence and a fragment of a DITHP. 

"Hybridization" refers to the process by which a strand of nucleotides anneals with a 
con5)lemaitary strand through base pairing. Spedfic hybridization is an indication that two nucldc 
10 add sequences share a high degree of identity. Specific hybridization complexes form under defined 
annealing conditions, and ranain hybridized after the "washing" step. The defined hybridization 
conditions include the annealing conditions and the washing slep(s\ the 1att«* of which is particularly 
inq)Qrtant in determining the stringency of the hybridization process, with more stringent conditions 
allowing less non-spedfic binding, i.e., binding between pairs of nucldc acid probes that are not 
1 5 perfectly matched Permissive conditions for annealing of nucleic acid sequaices are routinely 

determinable and may be consistent among hybridization expmmaits, whereas wash conditions may be 
varied among experiments to achieve the desired stringency. 

Generally, stringency of hybridization is expressed with reference to the temperature undo- 
vMcii the wash step is carried out. Generally, such wash tanpcratures are sdected to be about 5"C to 
20 lOPC lower than the thermal mdting point (TJ for the spedfic sequence at a defined ionic strength and 
pH. The Tro is the temperature (under defined ionic strength and pH) at which 50% of the target 
sequence hybridizes to a perfectly matched probe An equation for calculating T^ and conditions fOT 
nucldc add hybridization is weill known and can be found in Sambrook et al., 1989, Molecular 
Qomne: A Laboratory Manual. 2°^ ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; spedfically 
25 see volume 2, dieter 9. 

High stringency conditions for hybridization between polynucleotides of the present invention 
indude wash conditions of 68°C in the presence of about 0.2 x SSC and about 0. 1 % SDS, for 1 hour. 
Altmiativdy. temperatures of about 65*X:, 60T, or 55T may be used SSC concentration may be 
varied from about 0.2 to 2 x SSC, with SDS bdng presoit at about 0. 1 %. Typically, blocking reagents 
30 are used to block non-spedfic hybridization. Such blocking reagents include, fca- instance, denatured 
salmon sperm DNA at about 100-200 ng/ml. Usdul variations on these conditions will be readily 
apparent to those skilled in tiie art Hybridization, particularly under high stringency conditions, may 
be suggestive of evolutionary similarity bttween Uie nucleotides. Such similarity is strongly indicative 
of a similar role for the nucleotides and tiidr resultant proteins. 
3 5 Otiier parameters, such as temperature, salt concentration, and detergent conc^tration may be 
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varied to achieve the desired stringency. Denaturants, such as formamide at a concoitration of about 
35-50% vA^, may also be used under particular circumstances, such as RNArDNA hybridizations. 
Appropriate hybridization conditions arc routinely determinable by one of ordinary skill in the art 

"Immunogenic" describes the potential for a natural, recombinant, or synth^c pq>tide» epitope, 
5 polypeptide, or protdn to induce antibody production in qyprppriate animals, cells, or cdl lines. 

"Insertion" or "addition" refers to a change in either a nucleic or amino add sequaice in vMch 
at least one nucleotide or residue, respectively, is added to the sequence. 

"Labding" refers to the covalent or noncoval^t joining of a polynucleotide, polypq)tide, ot 
antibody with a repoler molecule capable of producing a detectable or measurable signal. 

"Miaoarray" is any arrangement of nucldc acids, amino adds, antibodies, etc., on a substrate. 
The substrate may be a solid support sudi as beads, glass, paper, nitrocdlulose, nylon, or an 
appropriate membrane. 

The to-ras "dement" and "array dement" refer to a polynucleotide, polypeptide, or other 
chemical compound having a unique and defined position on a microarray. 

'^Linkers" are short stretches of nucleotide sequence which may be added to a veaor or a dithp 
to create restriction aidonuclease sites to fadlitate cloning. "Polylinkers" are engineered to incorporate 
multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs 
(e.g., BamHI, EcoRI. and HindUI) and those which provide blunt ends (e.g., EcoRV, SnaBI, and StuI). 

"Naturally occurring" refa^ to an ^dogenous polynucleotide or polypq)tide that may be 
isolated from viruses or prokaryotic or eukaryotic cdls. 

"Nucldc acid sequence" refers to the specific order of nucleotides joined by phosphodiesto* 
bonds in a linear, polymeric arrangement Dqjcnding on the numbo* of nudeotides, the nucldc add 
sequence can be considered an oligoma-, oligonucleotide, or polynucleotide. The nucleic acid can be 
DNA, RNA, or any nucldc add analog, such as PNA, may be of genomic or synthetic origin, may be 
dther double-stranded or single-stranded, and can represent dther the sense or antisense 
(con^lementary) strand. 

"Oligomer" refers to a nuddc add sequence of at least about 6 nucleotides and as many as 
about 60 nudeotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 and 
30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may be used 
as, e.g., primers for PCR, and are usually chemically synthesized. 

"Operably linked" refers to the situation in which a first nucldc add sequence is placed in a 
functional rdationship with the second nucldc add sequence. For instance, a promoter is operably 
linked to a coding sequence if the promoter affects the transcription or expression of the coding 
sequoice. Generally, operably linked DNA sequ^ces may be in close proximity or contiguous and, 
where necessary to join two protdn coding regions, in the same reading frame. 
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"Pq)tide middc add" (PNA) refers to a DNA mimic in which nucleotide bases are attached to 
a pseudqpeptide backbone to inaease stability. PNAs, also designated antigene agrats, can prevent 
gene expression by targeting complonentary messoiger RN A. 

The phrases "percent identity'* and '*% identity", as applied to polynucleotide sequences, refer 
to the percentage of residue matches between at least two polynucleotide sequences aligned using a 
standardize algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in 
the sequences beipg compared in order to q)timize alignment betwe^ two sequences, and therefore 
achieve a more meaningful conq)arison of the two sequoices. 

Percent identity between polynucleotide sequences may be determined using tiie default 
parameters of tiie CLUSTAL V algCHitiim as incorporated into tiie MEGALIGN version 3. 1 2e sequence 
alignment program. This program is part of the LASERGENE software package, a suite of molecular 
biological analysis programs (DN AST AR, Madison W!). CLUSTAL V is described In Higgins, D.G. 
and Sharp, P.M. (1989) CABIOS 5:15M53 and in Higgins, D.G. et al. (1992) CABIOS 8:189-191. 
For pairwise alignm^ts of polynucleotide sequences, the default parameters are set as follows: 
Ktuple=2, gap penalty=5, wlndow=4, and "diagonals saved"=4. The "weighted" residue wdght table is 
sdected as the default Percoit identity is reported by CLUSTAL V as the "percent similarit/' betweai 
aligned polynucleotide sequence pairs. 

Alternatively, a suite of commonly used and fredy available sequence con^)arison algorithms is 
provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search 
Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), which is available from several 
sources, including the NCBI, Betiiesda, MD, and on the Internet at 

http://www.ncbi.nlninilLgov/BLAST/. The BLAST software suite includes various sequence analysis 
programs including "blastn," that is used to d^ermine alignment between a known polynucleotide 
sequ^ce and oUict sequences on a variety of databases. Also available is a tool called "BLAST 2 
Sequences" that is used fcx direct pairwise comparison of two nucleotide sequences. "BLAST 2 
Sequences" can be accessed and used interactivdy at http://www.ncbi.nlra.nih.gov/gorf^l2/. The 
"BLAST 2 Sequoices" tool can be used for botii blastn and blastp (discussed bdow). BLAST 
programs are commonly used with gap and otiier parameters set to default settings. For example, to 
conq)are two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences" tool Version 
2.0.9 (May-07-1999) set at default parameters. Such default parameters may be, for example: 

Matrix: BLdSUM62 

Reward for match: 1 

Penalty for mismatch: -2 

Open Gap: 5 and Extension Cap: 2 penalties 

Gapxdrop-ojf: 50 



117 



wo 00/73509 PCT/USOO/15404 

Expect: 10 
Word Size: I J 
Filter: on 

Percent identity may be measured over the length of an entire defined sequence, for exaiiq)le» as 
d^uied by a particular SEQ ID number, or may be measured over a shorter length, for exanq)le, over 
the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at 
least 30, at least 40. at least 50, at least 70. at least 100, or at least 200 contiguous nucleotides. Such 
lengths are exeiq)lary only, and it is understood that any fragment length supported by the sequences 
shown herein, in figures a Sequence Listings, may be used to describe a length over which percentage 
identity may be measured. 

Nucldc acid sequences that do not show a high degree of identity may nevertMess encode 
similar amino acid sequences due to the degeneracy of the gai^jc code. It is understood thai changes in 
nucleic acid sequence can be made using this degeneracy to produce multiple nucleic add sequences 
that all encode substantially the same protdn. 

The phrases ''percent identity" and "% identity", as applied to polypqjtide sequences, refer to 
the pCTcentage of residue matches between at least two polypq)tide sequences aligned using a 
standardized algarithm. Methods of polypqjtide sequence alignm^t are well-known. Some alignment 
methods take into account conservative amino add substitutions. Such conservative substitutions, 
explained in more detail above, generally preserve the hydrophobicity and addity of the substituted 
residue, thus preserving the structure (and therefore function) of the folded polypq)tide. 

Percent identity between polypqptide sequences may be determined using the default parametCTS 
of the CLUSTAL V algorithm as incorporated into the MEG ALIGN version 3. 1 2e sequence alignmait 
program (described and referenced above). For pairwise alignments of polypqjtide sequences using 
CLUSTAL V, thedtfault parameters are set as follows: Ktuple=l, gap penalty=3, window=5. and 
"diagonals saved"=5. The PAM250 matrix is sdected as the default residue wdght table. As with 
polynucleotide alignments, the percent identity is reported by CLUSTAL V as the "percent similarity" 
between aligned polypeptide sequoice pairs. 

Altemativdy the NCBI BLAST software suite may be used. For exan5)le, for a pairwise 
conn)arison of two polypq)tide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9 
(May-07-1999) with blastp set at default parameters. Such default parameters may be, for example: 
Matrix: BLOSUM62 

Open Gap: J J and Extension Gap: 1 penalty 
Gapxdrop-off: 50 
Ejqpect: 10 
Word Size: 3 
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Filter: on 

Percent idoitity may be measured over the length of an entire defined polypq}tide sequence* for 
exanq)le, as defined by a particular SEQ ID number, or may be measured over a shorter length, for 
example, over the length of a fi^agm^ taken fi'om a larger, defined polypq)tide sequence, for instance, 
a fragment of at least IS. at least 20. at least 30. at least 40. at least 50, at least 70 or at least 150 
contiguous residues. Such lengths are exenq)lary oiily. and it is understood that any fragment length 
supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a 
length over whidi perc^ge identity may be measured. 

'Tost-translational modification" of a DITHP may involve lipidation. glycosylation, 
phosphorylation, acetylation. rac^zation. proteolytic cleavage, and other modifications known in the 
art. Theseprocessdsmay occur synthetically or biochemically. Biochemical modifications will vary by 
cell type depending on the enzymatic milieu and the DITHP. 

"Probe'* refers to dithp or fragments thereof, which are used to detect id^tical, allelic or related 
nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a detectable 
label or rqjortCT molecule. Typical labels include radioactive isotopes, llgands, chemiluminescenl 
agaits, and enzymes. "Primers" are short nucldc acids, usually DNA oligonucleotides, which may be 
annealed to a targ^ polynucleotide by conq)lementary base-pairing. The primer may thai be extended 
along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification 
(and identification) of a nucldc add sequence, e.g., by the polymerase chain reaction (PCR). 

Probes and primers as used in the present invention typically comprise at least 15 contiguous 
nucleotides of a known sequence. In order to enhance specifidty, longer probes and primers may also 
be onployed, such as probes and prima-s that comprise at least 20, 30, 40, 50, 60. 70, 80. 90, 100. or 
at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may 
be considerably longer than these examples, and it is understood that any length supported by the 
specification, including the figures and Sequence Listing, may be used. 

Methods fen; prq)aring and using probes and primers are described in the references, for 
exanq)le Sambrook et al., 1989, Molecular Cloning: A Laboratorv Manual . 2*^ ed,, vol. 1-3, Cold 
Spring Harbor Press. Plainview NY; Ausubd ct al.,1987. Current Protocols in Molecular Biolocv . 
Greene Publ. Assoc. & Wilqr-Intersciences, New York NY; Innis et al.. 1990, PCR Protocols, A Guide 
to M^ods and Applications . Acad^c Press, San Diego CA. PCR primer pairs can be derived from 
a known sequence, for example, by using computer programs int^ded for that purpose such as Primer 
(Version 0.5. 1 991 , Whitdiead Institute for Biomedical Research. Cambridge MA). 

Oligonucleotides for use as primers are sdected using software known in the art for such 
purpose. For exanq)le. OLIGO 4.06 software is useful for the sdection of PCR primer pairs of up to 
100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 5.000 
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nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection 
programs have incorporated additional features for expanded capabilities. For exanq)le, the PrimOU 
primer selection pn^am (available to the public from the Genome Center at University of Texas South 
West Medical Cesaa, Dallas TX) is capable of choosing specific primers from megabase sequences 
5 and is thus useful for designing primers on a goiome-wide scope. The Primer3 primer selection 
program (available to the public from the Whitdiead Institute^T Center for Genome Research, 
Cambridge MA) allows the user to iq)ut a "mispriming library/' in which sequ^ices to avoid as primer 
binding sites are user-specified. PrimerS is useful* in particular, for the selection of oligonucleotides for 
miaoarrays. (Hie source code for the latter two primer selection programs may also be obtained from 

1 0 thdr respective sources and modified to meet the user's spedfic needs.) The PrimeGen program 
(available to the public from the UK Human G^me Moping Project Resource Centre, Cambridge 
UK) designs primers based on multiple sequence aligniReiits, ths'eby allowing sdcciion of primers that 
hybridize to either tlie most conserved or least conserved regions of aligned nucleic acid sequences. 
Hence, this program is useful for identification of both unique and conserved oligonucleotides and 

15 polynucleotide fragments. The oligonucleotides and polynucleotide fragments identified by any of the 
above sdection methods are useful in hybridization technologies, for example, as PGR or sequencing 
primers, microarray dements, or specific probes to identify fully or partially complementary 
polynucleotides in a sample of nucldc acids. Methods of oligonucleotide sdection are not limited to 
those described above, 

20 "Purified" refers to molecules, dther polynucleotides or polypqjtides that are isolated or 

sqjarated from thdr natural environment and are at least 60% free, prefffably at least 75% free, and 
most preferably at least 90% free from other compounds with which they are naturally associated. 

A "recombinant nucldc acid" is a sequ^ce that is not naturally occurring or has a sequence 
that is made by an artifidal combination of two or more otherwise sq)arated segments of sequence. 

25 This artifidal combination is often accomplished by chonical synthesis or. more commonly, by the 
artifidal manipulation of isolated segmoits of nucldc adds, e.g., by genetic engineering techniques 
such as those described m Sambrook, supra . The term recombinant includes nucldc acids that have 
been altered soldy by addition, substitution, or ddetion of a portion of the nucldc add. Frequently, a 
recombinant nucldc add may include a nucldc acid sequence qperably linked to a promoter sequence. 

3 0 Sudi a recombinant nuddc add may be part of a vector that is used, for example, to transform a cdl. 

Altemativdy, such recombinant nuddc adds may be part of a viral vector, e.g.. based on a 
vaccinia virus, that could be use to vaccinate a mammal wherdn the recombinant nuddc acid is 
expressed, indudng a protective immunological response in the mammal. 

"Regulatory donent" refers to a nuddc add sequence from nontranslated regions of a gene, 

3 5 and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host 
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proteins to carry out or regulate transcription or translatioa 

"Rq)orter" molecules are chmical or biodi^cal moieties used for labeling a nucldc add, an 
amino add, or an antibody. They include radionuclides; enzymes; fluorescent, chemiluminescenl, or 
chromogenic agents; substrates; cofaaors; inhibitors; magnetic particles; and other moieties known in 
5 theart 

An "RN A equivalent," in reference to a DNA sequence, is composed of the same linear 
sequence of nucleotides as the reference DNA sequence with the exertion that all occurrences of the 
nitrogenous base thymine are rq)laced with uracil, and the sugar backbone is composed of ribose 
instead of deoxyribose. 

10 "Saiiq)le" is used in its broadest sense. Samples may contain nucldc or amino acids, 

antibodies, or other materials, and may be doived from any source (e.g.. bodily fluids including, but not 
limited to, saliva, blood, and urine; chromosome(s), organdies, or membranes isolated from a cell; 
gaiomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cdls or tissues or blots 
or inq>rints from such cdls or tissues). 

15 "Spedfic binding" or "specifically binding" refers to the interaction between a protdn or 

peptide and its agonist^ antibody^ antagonist, or other binding partner. The interaction is dq)endent 
upon the presmce of a particular structure of the protdn, e.g., the antigenic determinant or epitope, 
recognized by the binding molecule. For exan^le, if an antibody is specific fen* q)itope *'A," the 
presence of a polypeptide containing epitope A, or the presence of free unlabded A, in a reaction 

2 0 containing free labded A and the antibody will reduce the amount of labded A that binds to the 

antibody. 

"Substitution" refers to the replacement of at least one nuclec^de or amino add by a different 
nudeotide or amino add 

"Substrate" refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, 
25 diips, slides, wafers, fibers, magnetic or nonmagnetic beads, gds, tubing, plates, polymers, 
miaoparticles or capillaries. The substrate can have a variety of surface forms, such as wdls, 
troiches, pms, channds and pores, to which polynucleotides or polypq)tides are bound. 

A "transcript image" refers to the collective pattern of gene expression by a particular tissue or 
cdl type under giv^ conditions at a givai time. 

3 0 "Transformation" refers to a process by which &cogenous DNA enters a recipient cdl. 

Transformation may occur under natural or artifidal conditions using various methods wdl known in 
the art Transformation may rdy on any known mdhod for the insertion of fordgn nucldc acid 
sequoices into a prokaryotic or oikaryotic host cdl. The method is sdected based on the host cdl bdng 
transformed. 

3 5 ^Transformants" include stably transformed cdls in whidi the inserted DNA is capable of 
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replication dther as an autonomously rq)licating plasmid or as part of the host chromosome^ as well as 
cdls which transiently express inserted DNA or RNA. 

A 'transgenic organism,*' as used herein, is any organism^ including but not limited to animals 
and plants, in ^ch one or more of the cells of the organism contains heterologous nuddc acid 
introduced by way of human intervention, such as by transgenic techniques wdl known in the art The 
nucldc add is introduced into the cdl, directly or indirectly by introduction into a precursor of the cdl, 
by way of ddiberatb genetic manipulation, such as by micrdnjection or by infection with a recombinant 
virus. The term genetic manipulation does not indude classical aoss-breeding, or in vitro fertilizatioa 
but rather is directed to the introduction of a recombinant DNA molecule. Hie transgenic organisms 
contemplated in accordance with the present invention indude bacteria, cyanobacteridt iiingi, and plants 
and animals. The isolated DNA of the present invention can be introduced into the host by methods 
known in the art, for exanrole infection, transfection» transfcrmation or transconjugatioit Techniques 
for transferring the DNA of the present invention into such organisms are widely known and provided in 
references such as Sambrook et al. (1989), supra . 

A "variant" of a particular nucldc acid sequence is defined as a nucldc acid sequence having at 
least 25% sequence identity to the particular nucldc acid sequence over a certain length of one of the 
nucldc add sequences using blastn with the "BLAST 2 Sequences" tool Vision 2.0,9 (May-07-1999) 
set at default paramOers. Such a pair of nucldc acids may show, fc»: example, at least 30%, at least 
50%, at least 60%, at least 70%, at least 80%, at least 90%, al least 95% or even al least 98% or 
greater sequaice identity over a certain defined length. The variant may result in "conservative" amino 
acid changes which do not affect struaural and/or chemical properties. A variant may be described as, 
for exan^le, an "alldic" (as defined above), "splice," "species," or "polymorphic" variant A splice 
variant may have significant identity to a reference molecule, but will g^erally have a greats* or lesser 
number of polynucleotides due to alternate splidng of exons during mRNA processing. The 
corresponding polypq)tide may possess additional fimctional domains or lack domains that are present 
in the reference molecule. Species variants are polynucleotide sequences that vary fi-om one species to 
another. The resulting polypq)tides generally will have significant amino add identity relative to each 
other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene betwe^ 
individuals of a given spedes. Polymorphic variants also may encompass "single nucleotide 
polymorphisms" (SNPs) in >^ch the polynucleotide sequaicc varies by one base. The presaice of 
SNPs may be indicative of, for exanq)le, a certain population, a disease state, a propensity for a 
disease state. 

In an alternative, variants of the polynudeotides of the present inv^on may be g^erated 
through recombinant methods. One possible method is a DNA shuffling technique such as 
MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number 
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5,837.458; Chang, C.-C. et al. (1999) Nat Biotechnol. 17:793-797; Christians, F.C. et al. (1999) Nat. 
Biotechnbl. 17:259-264; ami Crameri, A. a al. (1996) Nat Biotedmol. 14:315-319) to alter or in?jrove 
the biological properties of DITHP, sudi as its biological or enzymatic activity oar its ability to bind to 
other molecules or conQX}unds. DNA shuffling is a process by which a library of gene variants is 
produced using PCR-mediated recombination of gene firagmenls. Hie library is then subjected to 
sdection or scre^g procedures that identify those gene variants with the desired properties. Tliese 
preferred variants may tben be pooled and further subjected to recursive rounds of DNA shuffling and 
sdection/saeening. Thus, genetic diversity is aeated through "artificial" breeding and rapid molecular 
evolutioa For exan^)le» fragments of a single gene containing random point mutations may be 
recombined, saeened, and then reshuffled until the desired properties are optunized. Alternatively, 
fragments of a givai g^ may be recombined with fragmaits of homologous genes in the same gene 
family, either from the same or different species, thereby maxlmizii^ the genetic diversity of multiple 
naturally occuning genes in a directed and controllable manner. 

A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having 
at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of 
the polypeptide sequences using blastp with the "BLAST 2 Sequences'' tool Version 2.0.9 (May-07- 
1999) set at default paramos. Such a pair of polypeptides may show, for example, at least 50%, at 
least 60%. at least 70%, at least 80%. at least 90%. at least 95%, or at least 98% or greater sequ^ice 
identity over a certain d^ned length of one of the polypq)tides. 

THE INVENTION 

In a particular embodiment. cDN A sequences derived from human tissues and cell lines were 
aligned based on nucleotide sequence identity and assembled into "consensus" or "template" sequences 
^ch are designated by the template identification numbers (l^late IDs) in column 2 of Table 1 . 
The sequence idaitification numbers (SEQ ID NO.s) corresponding to the template IDs are shown in 
colunm 1. The tanplate sequaices have similarity to GaiBank sequences, or "hits," as designated by 
the GI Numbers in column 3. The statistical probability of each GenBank hit is indicated by a 
probability score in colunm 4, and the functional annotation corresponding to each GenBank hit is listed 
in column 5. 

The invention incorporates the nucleic add sequaioes of these tan)lates as disclosed in the 
Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states 
charaaerized by defects in human molecules. Tht invention further utilizes these sequences in 
hybridization and anqilification technologies, and in particular, in technologies which assess goie 
expression patterns caielated with specific cells or tissues and their responses in vivo or in vitro to 
pharmacoitical agents, toxins, and other treatments. In this manner, the sequences of the present 
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invention are used to devdop a transcr^t image for a particular cdl or tissue. 

Privation qf Nucldc Add Sequences 

cDNA was isolated from libraries constructed using RN A derived from normal and diseased 
human tissues and cdl lines. The human tissues and cdl lines used fac cDN A library construction were 
sdected from a broad range of sources to provide a diverse population of.cDNAs representative of gene 
transcription throughout the human body. Descriptions of the human tissues and cdl lines used for 
cDN A library construction are provided in the LIFESEQ database (Incyte Goiomics, Inc. (Incyte), Palo 
Alto CA). Human tissues were broadly sdected from, for example* cardiovascular, dermatologic, 
endocrine, gastroimestinal, hematopdeticAmmune syston, musculoskdetal, neural, rqiroductive, and 
urologic sources. 

Cdl lines used for cDNA library construciiun were derived from, for example, leukemic cdls, 
teratocarcinomas, neuro^itheIiomas» cervical carcinoma, lung fibroblasts, and endothdial cdls. Such 
cdl lines indude. for example, THP-1, Jurkat, HUVEC. hNT2, WI38. Hd-a, and otho* cell lines 
commonly used and available from public depositories (Am^ican Type Culture Collection, Manassas 
VA). Prior to mRNA isolation, cdl lines were untreated, treated with a pharmaceutical agent such as 
5 -aza-l -deoxycytidine, treated with an activating agoit such as lipopolysaccharide in the case of 
leukocytic cdl lines, or, in the case of endothdial cdl lines, subjected to shear stress. 

Sequencing of the cDNAs 

Methods for DNA sequendng are well known in the art. Convaitional enzymatic methods 
employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. 
Biochemical Corporation, Clevdand OH), Taq polymerase (PE Biosystems. Foster City CA), 
thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham Pharmacia Biotech), 
Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in 
the ELONGASE an5)lification system (Life Technologies Inc. (Life Technologies), Gaithersburg MD), 
to extend the nucldc add sequence from an oligonucleotide primer annealed to the DNA template of 
interest. Methods have been devdpped for the use of both single-stranded and double-stranded 
tenq)lates. Chain termination reaction products may be dectrophoresed on urea-polyacrylamide gds 
and detected dther by autoradiography (for radidsotope-labded nucleotides) or by fluorescence (for 
fluorq)hore-labded nucleotides). Automated methods for mechanized reaction prq)aration, sequencing, 
and analysis using fluorescence detection methods have been devdq)ed. Machines used to prq)are 
cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company 
(Hamilton), Reno NV), Pdtier thermal cyder (PTC200; MJ Research. Inc. (MJ Research), Watertown 
MA), and ABI CATALYST 800 thermal cyder (PE Biosystems). Sequaicing can be carried out using. 
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for exaiq)le» the ABI 373 or 377 (P£ Biosyst^) or MEGABACE 1000 (Molecular Dynamics, Inc. 
(Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual 
sequencing systems weD known in the art. 

Hie nudeotide sequences of the Sequence Listing have been prepared by curr^ state-of-the- 
5 art, automated methods and, as such, may contain occasional sequencing errors or unidentified 

nucleotides. Such unidentified nucleotides are designated by an N. These infi'equent unidentified bases 
do not represQit a hindrance to practicing the invention for those skilled in the art Several methods 
eiiq)loying standard recombinant techniques may be used to conrea errors and conq>lete the missing 
sequsice information. (See, &g., those described in Ausubd, F.M. et al. (1997) Short Protocols in 
10 Molecular Biology . John WilQr & Scmis, New York NY; and Sambrook, J. et al. (1989) Molecular 
Cloning. A Laboratorv Manual . Cold Spring HarbOT Press. Plainview NY.) 

Assembly of cDNA Sequences 

Human polynucleotide sequences may be assembled using programs or algorithms wdl known 

15 in the art. Sequ^ces to be ass^bled are related, wholly or in part, and may be derived from a single 
or many differait transcripts. Assembly of the sequ^ices can be performed using such programs as 
PHRAP (Phils Revised Assonbly Program) and the GELVIEW fragment assonbly system (GCG). or 
other methods known in tlie art. 

Altemativdy, cDNA sequences are used as "con^x)nent" sequences that are assembled into 

20 '"template" or "consensus'* sequences as follows. Sequence chromatograms are processed, verified, and 
quality scores are obtained using PHRED. Raw sequ^ces are edited using an editing pathway known 
as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto. CA). A series 
of BLAST comparisons is pa-formed and low-information segmaits and rq)etitive dements (e.g., 
dinucleotiderqpeats, Alu rq)eats, etc.) are rqjlaced by "n*s", or masked, to prevent spurious matches. 

25 Mitochondrial and ribosomal RNA sequences are also removed The processed sequences are then 
loaded into a rdational database management system (RDMS) which assigns edited sequences to 
existing templates, if available. When additional sequences are added into the RDMS, a process is 
initiated which modifies ©cisting templates or aeates new templates from works in progress (i.e., 
nonfinal assembled sequences) containing quoied sequences or the sequences themsdves. After the new 

3 0 sequences have been assigned to templates, the templates can be merged into bins. If multiple templates 
exist in one bin, the bin can be split and the templates reannotated. 

Once gent bins have been generated based upon sequence alignments, bins are "clone joined" 
based upon clone information. Gone joining occurs when the 5* sequence of one clone is present in one 
bin and the 3* sequence from the same done is present in a different bin, indicating that the two bins 

35 should be merged into a single bin. Only bins which share at least two different dones are merged. 
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A resultant teEiq)late sequence may contain dther a partial or a iuU length open reading frame, 
or all or part of a genetic regulate^ dement This variation is due in part to the fact that the full length 
cDNAs of many genes are several hundred, and som^imes several thousand, bases in length. With 
curr^ technology, cDNAs con^nsing the coding regions of large g^ies cannot be cloned because of 
vector limitations, inconq)lete reverse transcription of the mRNA, or incomplete "second strand" 
synthesis. Tenq)Iate sequences may be extended to include additional contiguous sequences derived 
from the parent RNA transcript using a variety of m^hods known to those of skill in the art. Extension 
may thus be used to achieve the full length coding sequence of a gene. 

Analvsis of the cDNA Seq umcfts 

The cDNA sequaices are analyzed using a variety of programs and algorithms which are well 
known in the art. (See, e.g.. AusubeL 1 997, supra . Chapter 7.7; Meyers, R.A. (Ed.) (1995) Molecuiar 
Biology and B iotechnology . Wiley VCH, New York NY, pp, 856-853; and Table 5.) These analyses 
comprise both reading frame d^erminations, e.g., based on triplet codon periodicity for particular 
organisms (Fickett, J.W. (1982) Nucldc Adds Res. 10:5303-5318); analyses of potential start and stop 
codons; and homology searches. 

Computer programs known to those of skill in the art for pa-forming computer-assisted 
searches for amino add and nucldc acid sequaice similarity, include, for example, Basic Local 
Alignment Search Tool (BLAST; Altschul, S.F. (1993) J. Mol. Evol. 36:290-300; Altschul, S.F. et al. 
(1990) J. Mol. Biol. 215:403-410). BLAST is especially useful in d^ermining exact matches and 
con5)aring two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal 
and for which the alignmoit score meets or exceeds a threshold or cutoff score set by the user (Karlin, 
S. et al. (1988) Proc. Natl. Acad. Sd. USA 85:841-845). Using an appropriate search tool (e.g., 
BLAST or HMM), GoiBank, SwissProt, BLOCKS. PFAM and other databases may be searched for 
sequences containing regions of homology to a query dithp or DITHP of the present invention. 

Other approaches to the identification, assonbly, storage, and display of nudeotide and 
polypeptide sequences are provided in "Rdational Database for Stmng Biomolecule Infmnation," 
U.S.S.N. 08/947.845, filed October 9. 1997; "Project-Based Full-Length Biomolecular Sequence 
Database," U.S.S.N. 08/81 1,758, faed Mardi 6, 1997; and "Rdational Database and Syst^ for 
Storing Infonnation Rdating to Biomolecular Sequences," U.S.S.N. 09/034.807, filed March 4, 1998, 
all of whidi are inccsporated by reference herdn in thdr entir^y. 

Protdn hierardiies can be assigned to the putative encoded polypq)tide based on, e.g., motif, 
BLAST, or biological analysis. Mdhods for assigning these hierardiies are desaibed, for example, in 
"Database Systan Enn)loying Protdn Function Hierarchies fca* Viewing Biomolecular Sequence Data," 
U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herdn by reference. 
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Identification of Human Diagnostic and Therapeutic Molecules Encoded bv dithp 

Hie identities of the DITHP encoded by the ditiq) of the present invention w^e obtained by 
analysis of the assembled cDNA sequences. SEQ ID N0:1, SEQ ID N0:2, SEQ ID N0:3, SEQ ID 
5 N0:4» and SEQ ID N0:5 encode, for example, human enzyme molecules. SEQ ID N0:6 and SEQ ID 
N0:7 encode, for example, receptor molecules. SEQ ID N0:8. SEQ ID N0:9, SEQ ID NO: 10. SEQ 
ID NO: 11 , and SEQ ID NO: 1 2 encode, for example, intracellular signaling molecules. SEQ ID 
N0:13 encodes, for example, a membrane transport molecule. SEQ ID N0:14, SEQ ID N0:1S. SEQ 
ID N0:16, SEQ ID N0:17, SEQ ID N0:18, SEQ ID N0:19. and SEQ ID NO:20 encode, for 

10 example, nucleic acid synthesis and modification molecules. SEQ ID N0:21 and SEQ ID NO:22 
encode, for example, adhesion molecules. SEQ ID NO:23 and SEQ ID NO:24 encode, for example, 
electron transfer assoGialed molecules, SEQ ID NO:25 encodes, for example, a secreted/extraceDuiar 
matrix molecule. SEQ ID NO:26 and SEQ ID NO:27 encode, for example, cytoskeletal molecules. 
SEQ ID NO:28 and SEQ ID NO:29 encode, for example, cell membrane molecules. SEQ ID NO:30 

15 and SEQ ID N0:31 encode, for example, ribosomal molecules. SEQ ID NO:32. SEQ ID NO:33, 
SEQ ID NO:34, SEQ ID NO:35. SEQ ID NO:36. SEQ ID NO:37. SEQ ID NO:38, SEQ ID NO:39. 
SEQ ID NO:40. SEQ ID N0:41, SEQ ID NO:42, and SEQ ID NO:43 encode, for example, 
transcription factor molecules. SEQ ID NO:44. SEQ ID NO:45, and SEQ ID NO:46 encode, for 
example, organelle associated molecules. SEQ ID NO:47. SEQ ID NO:48. SEQ ID NO:49, and SEQ 

20 ID NO:50 encode, for example, biochemical pathway molecules. SEQ ID N0:5 1 and SEQ ID NO:52 
encode, for example, molecules associated with growth and development 

Sequences of Human Diagnostic and Therapeutic Molecules 

The dithp of the present invention may be used for a variety of diagnostic and thatapcatic 

2 5 purposes. For exanq)le, a ditl^ may be used to diapose a particular condition, disease^ or disorder 

associated with humlan molecules. Such conditions, diseases, and disorders include, but are not limited 
to, a cell prolifo-ative disorder, such as actinic keratosis, arteriosdo'osis, atiierosclerosis, bursitis, 
cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal 
hemoglobinuria, polycydiemia vera, psoriasis, primary tiirombocytiiemia, and cancers including 
30 adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocardnoma, and, in 
particular, a canc^ of the adrenal gland, bladder, bone, bone manow, brain, breast, cervix, gall 
bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, paratiiyroid, 
penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; an 
autoinunune/infiammatory disorder, such as inflammation, actinic k^aosis. acqmred 

3 5 immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress syndrome. 
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allergies, anlQflosing spondylitis, amyloidosis, anemia, arteriosclerosis, asthma, atherosclerosis, 
autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, diolecystitis, cirrhosis, 
contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus, 
emphysema, aythroblastosis fetalis, oylhema nodosum, atrophic gastritis, glomerulonephritis, 
5 Goo(H)asture's syndrome, gout. Graves' disease, Hashimoto's thyroiditis, paroxysmal nocnimal 
hemoglobinuria, hepatitis, hypareosinophilia, irritable bowel syndrome, episodic lymphopenia with 
lymphocytotoxins, mixed connective tissue disease (MCTD), multiple sclaosis, myasthenia gravis, 
myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, panaeatitis, 
polycythemia v^a, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, 

10 Sjogren's syndrome, systemic anaphylaxis, systemic lupus aythematosus. systemic sclerosis, primary 
thrombocythemia, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, 
complications of cancer, hemodialysis, and extracorporeal circulation, trauma, and hematopoietic 
cancer including lymphoma, leukemia, and myeloma; an infection caused by a viral agent classified 
as adenovirus, arenavirus, bunyavirus, calicivirus, coronavirus, filovirus. hepadnavirus. herpesvirus, 

15 flavi virus, orthomyxovirus, parvovirus, papovavirus, paramyxovirus, picomavinis, poxvirus, reovirus, 
retrovirus, ihabdovirus, or togavirus; an infection caused by a bacterial agent classified as 
pneumococcus, staphylococcus, streptococcus, bacillus, corynebacterium, Clostridium, 
meningococcus, gonococcus, listeria, moraxella, kingella, haemophilus, l^iondla, bordetella, gram- 
negative enterobacterium including shigella, salmonella, or Campylobacter, pseudomonas, vibrio, 

20 brucella, frandsella, yersinia, bartondla, norcardium, actinomyces, mycobacterium, spirochaetale, 
rickettsia, chlamydia, or mycoplasma; an infection caused by a fungal agent classified as aspergillus, 
blastomyces, dermatophytes, cryptococcus, cocddioides, malasezzia, histoplasma, or other mycosis- 
causing fungal agent; and an infection caused by a parasite classified as Plasmodium or malaria- 
causing, parasitic entamoeba, leishmania, trypanosoma, toxoplasma, Pneumocystis carinii, intestinal 

2 5 protozoa such as giardia, trichomonas, tissue nematode such as trichinella, intestinal nematode such 

as ascaris, lymphatic filarial nematode, trematode such as schistosoma, and cestrode such as 
tapeworm; a developmental disordo* such as renal tubular acidosis, anemia, Cushing's syndrome, 
achondroplastic dwarfism. Duchenne and Becker muscular dystrophy, epilepsy, gonadal dysgenesis. 
WAGR syndrome (Wilms' tumor, aniridia, genitourinary abnormalities, and mental retardation), 
30 Smith-Magenis syndrome, myelodysplastic syndrome, hereditary mucoeplthelial dysplasia, hereditary 
keratodennas, h^-editary neuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis, 
hypothyroidism, hydrocephalus, seizure disorden such as Syndenham's chorea and cerebral palsy, 
spina bifida, anencephaly, craniorachischisis, congenital glaucoma, cataract, and sensorineural hearing 
loss; an oidocrine disorder such as a disorder of the hypothalamus and/OT pituitary resulting from 

3 5 lesions such as a primary brain tumor, adenoma, infarction associated with pregnancy, 
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hypophysectomy, anearysm, vascular malformatiQn, thrombosis, infection, imnnmologica] disorder, and 
coiq)lication due to head trauma; a disorder associated with hypopituitarism including hypogonadism, 
Sbeehan syndrome, diabetes insipidus, Kallman's disease, Hand-Schuller-Christian disease, Letterer- 
Siwe disease, sarcoidosis, enqyty seUa syndrome, and dwarfism; a disorder associated with 
hypeq>ituitarism including acromegaly, giantism, and syndrome of inappropriate antidiuretic hormone 
(ADH) secr^on (SIADH) often caused by benign adenoma; a dismler associated with hypothyroidism 
including goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis 
associated with viral infection, autoimmune thyroiditis (Hashunoto's disease), and cretinism; a disorder 
associated with hyperthyroidism inchidipg thyrotoxicosis and its various forms. Grave's disease, 
pretibial myxedema, toxic multinodular goiter, thyrdd cardmnna, and Plummer's disease; a disorder 
associated with hyp^arathyroidism including Conn disease (chronic hypercalemia); a pancreatic 
disorder such as l^pe I or T>pe 11 diabetes mellitas and associated cornplicaaons; a disorder associated 
with tht adrenals such as hyperplasia, carcinoma, or adenoma of the adrenal cortex, hypertoision 
associated with alkalosis, amyloidosis, hypokal^a, Cushing's disease, Liddle's syndrome, and 
Amold-Healy-GOTdon syndrome, pheochromocytoma tumors, and Addison's disease; a disorder 
associated with gonadal steroid hmnones such as: in women, abnormal prolactin productioa 
infertility, endometriosis, perturbation of the menstrual cycle, polycystic ovarian disease, 
hypoprolactinemia, isolated gonadotropin deficiency, am^orrhea, galactorrhea, hermaphroditism, 
hirsutism and virilization, breast cancer, and, in post-menopausal women, osteoporosis; and, in men, 
Leydig cell deficiency, male climactoic phase, and g^minal cell aplasia, a hypergonadal disorder 
associated with Leydig cdl tumors, androg^ resistance associated with absence of androgen receptors, 
syndrome of 5 a-reductase, and gynecomastia; a metabolic disorder such as Addison's disease, 
cerebrotendinous xanthomatosis, congenital adrenal hyp^lasia, coumann resistance, cystic fibrosis, 
diabetes, fatty hepatocirrhosis, fructose- 1,6-diphosphatase deficiency, galactosemia, goiter, 
glucagonoma, glycogen storage diseases, hw-editary fruaose intolerance, hyperadrenalism, 
hypoadrenalism, hyperparatiiyroidism, hypoparathyroidism. hypCTCholestenolemia, hyperthyroidism, 
hypoglycentia, hypothyroidism, hyperlipidemia, hyperlipemia, lipid myopathies, lipodystrophies, 
lysosomal storage diseases, mannosidosis, neuraminidiEise deficiency, obesity, pentosuria 
phenylketonuria, pseudovitamin D-deficiency rickets; disorders of carbohydrate metabolism such as 
congenita] type n dyserythropoietic anemia, diabetes, insulin-dependent diabetes mellitus, 
non-insulin-dependent diabetes mellitus, fiructose-l,6-diphosphatase deficiency, galactosemia, 
glucagonoma, hereditary fructose intolerance, hypoglyc^a, mannosidosis, neuraminidase 
deficiency, obesity, galactose q}imerase deOciracy, glycogen storage diseases, lysosomal storage 
diseases, fhictosuria, pentosuria, and inherited abnormalities of pyruvate metabolism; disorders of 
lipid metabolism such as fatty liver, cholestasis, primary biliary cirrhosis, carnitine deficiency, 
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carnitine palmitoyltransfo-ase deficiency, myoadenylate deaminase deficiency, hypertriglyceridemia, 
lipid storage disorders such Fabry's disease, Gaucher's disease, Niemann-Pick's disease, 
metachromatic leukodystrophy, adrenoleukodystrophy, GM2 gangliosidosis, and c^oid 
lipofuscinosis, abetalipoproteinemia, Tangier disease, hyperiipoproteinemia, diabetes mellitus, 
5 lipodystrophy, lipomatoses, acute panniculitis, disseminated fat necrosis, adiposis dolorosa, lipoid 
adrenal hyperplasia, minimal change disease, lipomas, atherosclerosis, hypercholesterolemia, 
hypercholesto-olemia with hypertriglyceridemia, primary hypoalphalipoproteinemia, hypothyroidism, 
renal disease, liver disease, lecithinxholesterol acyltransferase deficiency, cerebrotendinous 
xanthomatosis, sitost^lemia, hypocholesterolemia, Tay-Sachs disease, Sandhoffs disease. 

10 hyperlipidemia, hyperlipemia, lipid myopathies, and obesity; and disorders of copper metabolism 
such as Menkens disease, Wilson's disease, and Ehlers-Danlos syndrome type IX; a neurological 
discH'der such as q)ilq)sy, ischemic cerebrovascular disease,, stroke, cerebral neoplasms, Alzheimer's 
disease. Pick's disease, Huntington's disease, dementia, Parkinson's disease and other extrapyramidal 
disorders, amyotrophic lateral sclerosis and other motor neuron disorders, progressive neural muscular 

15 atrophy, retinitis pigmentosa, hereditary ataxias, multiple sclerosis and other demyelinating diseases, 
bacterial and viral meningitis, brain abscess, subdural empyema, q}idural abscess, suppurative 
intraaanial thrombophlebitis, myditis and radiculitis, viral central nervous system disease, prion 
diseases including kuru, Creutzfddt-Jakob disease, and Gerstmann-Straussler-Schdnker syndrome, 
fata] familial insonmia, nutritional and metabolic diseases of the nervous system, neurofibromatosis, 

20 tuberous sclerosis, cerebeUcr^inal honangioblastomatosis, encq)halotrigeminal syndrome, mental 
retardation and other developmental disorder of the c^al nervous syston, cerebral palsy, a 
neuroskeletal disorder* an autonomic nervous sys\jsm disorder, a cranial nerve disorder, a spinal cord 
disease, muscular dystrophy and other neuromuscular disorder, a peripheral nervous syst^ disordo', 
dermatomyositis and polymyositis, inherited, metabolic, endocrine, and toxic myopathy, myasthenia 

2 5 gravis, periodic paralysis, a mental disorder including mood, anxiety, and schizophrenic disorders, 

seasonal affecdve disorder (SAD), akathesia. amnesia, catatonia, diabetic neuropathy, tardive 
dyskinesia, dystonias, paranoid psydioses, postherpdic neuralgia, and Tour^'s disorder; a 
gastrointestinal disorder including ulcerative colitis, gastric and duodenal ulcers, cystinuria, 
dibasicaminoadduria, hypercystinuria, lysinuria, hartnup disease, tryptophan malabsorption, 
30 methionine malabsorption, histidinuria, iminoglycinuria, dicarboxylicaminoaciduria, cystinosis, renal 
glycosuria, hypouricemia, familial hypophophatemic rickets, congenital chloridorriiea, distal renal 
tubular addosis, Menkes' disease, Wilson's disease, lethal diantiea, juvenile pmiicious anemia, 
folate malabsorption, adrenoleukodystrophy, hereditary myoglobinuria, and 2^11weger syndrome; a 
transport disorder such as akinesia, amyotrophic lat^ sclerosis, ataxia tdangiectasia, cystic fibrosis, 

3 5 Becker's muscular dystrophy. Bell's palsy, Charcot-Marie Tootii disease, diabetes mellitus, diabetes 
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insipidus, diabetic neuropathy, Duchenne muscular dystrophy, hypeikalemic periodic paralysis, 
normokaleinic p^odic paralysis, Parkinson's disease, malignant hypmhennia, multidrug resistance, 
myasthenia gravis, myotonic dystrophy, catatonia, tardive dyskinesia, dystonias, peripheral 
neuropathy, cerebral neoplasms, prostate cancer, cardiac disorders associated with transport, e.g., 
angina, bradyanythmia, tachyarrythmia, hypertension. Long QT syndrome, myocarditis, 
cardiomyopathy, nemaline myopathy, centronuclear myopathy, lipid myopathy, mitochondrial 
myopathy, thyrotoxic myopathy, ethanol myopathy, dermatomyositis, inclusion body myositis, 
infectious myositis, and polymyositis, neurological disorders associated with transport, e.g., 
Alzheimer's disease, amnesia, bipolar disorder, dementia, depression, epilepsy, Tourette's disorder, 
paranoid psychoses, and schizophrenia, and other disorders associated with transport, e.g., 
neurofibromatosis, postherpetic neuralgia, trigeminal neuropathy, sarcoidosis, sickle cell anemia, 
cataracts, infertility, pulmonary artery stenosis, sensorineural autosomal deafness, hyperglycemia, 
hypoglycemia. Grave's disease, goiter, glucose-galactose malabsorption syndrome, 
hypercholesterolemia, Cushing's disease, and Addison's disease; and a connective tissue disorder 
such as osteogenesis imperfecta, Ehlers-Danlos syndrome, chondrodysplasias. Marfan syndrome, 
Alport syndrome, familial aortic aneurysm, achondroplasia, mucopolysaccharidoses, osteoporosis, 
osteopetrosis, Paget's disease, rickets, osteomalacia, hyperparathyroidism, renal osteodystrophy, 
osteonecrosis, osteomyditis, osteoma, osteoid osteoma, osteoblastoma, osteosarcoma, 
osteochondroma, chondroma, chondroblastoma, chondromyxoid fibroma, chondrosarcoma, fibrous 
cortical defect, nonossifying fibroma, fibrous dysplasia, fibrosarcoma, malignant fibrous 
histiocytoma, Ewing's sarcoma, primitive neuroectodomal tumor, giant cell tumor, osteoarthritis, 
rheumatoid arthritis, ankylosing spondyloarthritis, Reiter's syndrome, psoriatic arthritis, enteropathic 
ardiritis, infectious arthritis, gout, gouty arthritis, calcium pyrophosphate crystal dq)osition disease, 
ganglion, synovial cyst, villonodular synovitis, systemic sclerosis, Dupuytren's contracture, hepatic 
fibrosis, lupus erythematosus, mixed connective tissue disease, epidermolysis bullosa simplex, bullous 
congenital ichthyosiform erythroderma (q)idermolytic hyperkeratosis), non-epidermolytic and 
q)iderniolytic palmqplantar keratoderma, ichthyosis bullosa of Si^nens, pachyonychia congenita, and 
white spoi^e nevus. The dithp can be used to detect the presaice of, or to quantify the amount of, a 
dithp-rdated polynucleotide in a sample. This information is then conq)ared to information obtained 
from apprq)riate refermce sanq)les, and a diagnosis is established. Altemativdy, a polynucleotide 
complementary to a given dithp can inhibit cb* inactivate a therapeutically rdevant gene rdated to the 
ditlq). 

Analysis of dithp Expression Patterns 

The expression of dithp may be routindy assessed by hybridization-based m^ods to 
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deftenniiie» for example, the tissue-specificity, disease-specificity, or devdopmental stage-specifidty of 
dittq) expression. For example, the level of expression of dithp may be compared among difTerent cdl 
types or tissues, among diseased and nonnal cdl types or tissues, amoiQ cell types or tissues at 
different devdq)mental stages, or among cdl types or tissues undergoing various treatments, lliis type 
5 of analysis is useful, for example, to assess the rdative levds of dithp expression in fully or partially 
differentiated cdls or tissues, to determine if changes in ^thp expression levds are oardated with the 
devdopnm or progression of spedfic disease states, and to assess the response of a cdl or tissue to a 
specific therapy, for exanq}le, in pharmacological or toxicolpgical studies. Methods for the analysis of 
ditlq) expression are based on hybridization and anq)lification technologies and include mmbrane- 
10 based procedures such as northern blot analysis, high-throughput procedures that utilize, for exanq)le, 
mia-oarrays, and PCR-based procedures. 

Hybridization and Genetic Analysis 

The dithp, thdr fi-agments, or complementary sequences, may be used to identify the presence 

15 of and/or to determine the degree of similarity betwe^ two (or more) nucldc add sequaices. The dithp 
may be hybridized to naturally occurring or recombinant nucldc acid sequences under appropriatdy 
selected temperatures and salt concentrations. Hybridization with a probe based on the nucldc add 
sequence of at least one of the dithp allows fa* the detection of nucleic add sequaices, including 
genomic sequences, which are identical or rdated to the ditlq) of the Sequence Listing. Probes may be 

2 0 sdected from non-consored or unique regions of at least one of the polynucleotides of SEQ ID NO: 1 - 
52 and tested for thdr ability to identify or amplify the target nucldc acid sequ^ce using standard 
protocols. 

Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in SEQ 
ID NO:l-52 and fragments thereof, can be identified using various conditions of stringency. (See, e.g., 

25 Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmd, A.R. (1987) Methods 
Enzymol, 152:507-511.) Hybridization conditions are discussed in "Definitions." 

A probe for use in Southern or northern hybridization may be derived from a fragment of a 
^thp sequence, or its complement, that is up to sevaal hundred nucleotides in length and is dther 
single-stranded or double-stranded Such probes may be hybridized in solution to biological mata-ials 

30 such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to 
artifidal substrates containing dithp. Miaoarrays are particularly suitable for identifying the presence 
of and detecting the levd of expression for multiple genes of interest by examining gene expression 
corrdated with, e.g.\ various stages of devdppment, treatment with a drug or compound, or disease 
progression. An array analogous to a dot or slot blot may be used to arrange and link polynucleotides 

35 to the surface of a substrate using one or more of the following: mechanical (vacuum), chemical, 
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thennal, or UV bonding procedures. Such an array may contain any number of dithp and may be 
produced by band or by using available devices, materials, and machines. 

MiCToairays may be prq)ared, used, and analyzed using nMlhodslmown (See.e.g., 
Brennan, T.M. et al. (1995) US, Patent No, 5.474,796; Schena, M. et al. (1996) Proc. NaU. Acad Sci. 
USA 93:10614-10619; Baldeschwdler &L al. (1995) PCT application W095/251 1 16; Shalon. D. al. 
(1995) PCT appUcation WO95/35505; Hdler. R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- 
2155; and Hdler, M.J. et al. (1997) U.S. Patent No. 5.605,662.) 

Probes may be labded by either PCR or enzymatic techniques using a variety of commercially 
available rqxMter molecules. For cxan5)le. commerdal kits are available for radioactive and 
chemilurainescait labding (Amersham Pharmacia Biotech) and for alkaline phosphatase labding (Life 
Technologies). Altonativdy. dithp may be cloned into commercially available vectors for the 
production of RNA probes. Such probes may be transcribed in the preseocje of at least one labeled 
nucleotide (e.g.. ^^P-ATP. Amersham Pharmacia Biotech). 

Additionally the polynudeotides of SEQ ID NO: 1 -52 or suitable fragments thereof can be used 
to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures wdl 
known in the art, e.g.. cDNA library screening, PCR amplification, etc. The molecular cloning of such 
full length cDN A sequences may employ the method of cDN A library screening with probes using the 
hybridization, stringency, washing, and probing strategies described above and in Ausubd, supra . 
Chapters 3, 5, and 6. These procedures may also be anployed with genomic libraries to isolate 
genomic sequences of ditlq) in order to analyze. e.g., regulatory dements. 

Genetic Mapping 

Gent identification and mapping are important in the investigation and treatment of almost all 
conditions, diseases, and disorders. Cancer, cardiovascular disease. Alzhdmer's disease, arthritis, 
diabetes, and mental iUnesses are of particular interest. Each of these conditions is more conqjlex than 
the single gene defects of sickle cell anemia ot cystic fibrosis, with sdect groups of genes bdng 
predictive of predisposition for a particular condition, disease, or disorder. For exan^le, 
cardiovascular disease may result from malfunctioning receptor molecules that fail to dear cholesterol 
from the bloodstream, and diab^ may result whai a particular individual's immune system is 
activated by an infection and attacks the insulin-producing cdls of the pancreas. In some studies. 
Alzheimer's disease has been linked to a gene on chromosome 21 ; other studies predia a different gene 
and location. Mapping of disease genes is a complex and rdterative process and generally proceeds 
from genetic linkage analysis to physical mapping. 

As a condition is noted among members of a family, a gen^c linkage map traces parts of 
chromosomes tiiat are inhffited in Uie same pattern as the condition. Statistics link tiie inheritance of 
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particular conditions to particular r^ons of chromosomes, as defined by RFLP or other markers. 
(See, for exarapl^ Lander, E. S. and Botstdn, D. (1986) Proc. Nad. Acad. Sd. USA 83:7353-7357.) 
Occasionally, gen^c marlcers and thdr locations are known from previous studies. More often, 
however, the markers are simply stretches of DNA that differ among individuals. Examples of goietic 
linkage maps can be found in various scientific journals or at the Online Moiddian Inheritance in Man 
(OMIM) World Wide Web site. 

In another ^labodiment of the inv^tion, dithp sequences may be used to generate hybridization 
probes useful in chromosomal mapping of naturally occurring genomic sequences. Either coding or 
noncoding sequences of dithp may be used, and in some instances, noncoding sequences may be 
preferable owcr coding sequences. For example, conservation of a ditlp coding sequence among 
members of a multi-gene family may potentially cause undesired aoss hybridization during 
chromosomal mapping. The sequences may be mapped to a particular chromosome, to a specific region 
of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes 
(HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI 
constructions, or single chromosome cDNA libraries. (See, e.g., Harrington, J.J. et al. (1997) Nat. 
Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; andTrask, B.J. (1991) Trends Genet. 
7:149-154.) 

Ruorescent in situ hybridization (FISH) may be coirdated with other physical chromosome 
mapping techniques and genetic map data. (See, e.g., Meyers, supra , pp. 965-968.) Correlation 
between the location of dithp on a physical chromosomal map and a specific disorda*, ot a 
predisposition to a specific disorder, may help define the region of DNA associated with that disorder. 
The dithp sequences may also be used to detect polymorphisms that are gen^cally linked to the 
inheritance of a particular condition, disease, or disorder. 

In situ hybridization of (^omosomal preparations and goietic mapping techniques, such as 
linkage analysis using established chromosomal markers, may be used for extending existing genetic 
maps. Often the placement of a gene on the chromosome of anotho- manmialian species, such as 
mouse» may reveal associated markers even if the number or arm of the corresponding human 
chromosome is not known. These new marker sequences can be mapped to human chromosomes and 
may provide valuable information to investigators seardiing for disease genes using positional cloning 
or other g^e discovery tedmiques. Once a disease or syndrome has be^ crudely correlated by genetic 
linkage with a particular genomic region, e.g., ataxia-telangiectasia to 1 lq22-23, any sequoices 
mapping to that area may represent associated or regulatiny genes for fiirther investigation. (See, e.g., 
Gatti, R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequences of the subject invention may 
also be used to d^ect differences in chromosomal architecture due to translocation, inversion, etc., 
among normal, carrier, or affected individuals. 
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Once a disease-associated gate is nuq^ped to a chromosomal region, the gene must be cloned in 
order to identiiy mutations or other alterations (e.g., translocations or inversions) that may be corrdated 
with disease. This process requires a physical map of the chromosomal region containing the disease- 
gene of interest along with associated markers. A physical map is necessary for determining the 
5 nucleotide sequence of and order of marker genes on a particular diromosomal region. Physical 
mapping techniques are well known in the art and require the generation of overlapping sets of cloned 
DNA fragments from a particular organ^e, chromosome, or genome. These clones are analyzed to 
reconstruct and catalog their order. Once the position of a marker is d^ermined, the DNA frcxn that 
r^on is obtained by consulting the catalog and sdecting clones from that regioa The gene of interest 
10 is located through positional cloning techniques using hybridization or similar methods. 

Diagnostic Uses 

The fMtp of the present invention may be used to design probes useful in diagnostic assays. 
Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, 

15 disorders, or diseases associated with abnormal levels of dithp expressioa Labded probes developed 
from ditlq) sequences are added to a sample under hybridizing conditions of desired stringency. In some 
instances, dithp, or fragments or oligonucleotides derived from dithp, may be used as primers in 
anplification steps prior to hybridization. The amount of hybridization complex formed is quantified 
and compared with standards for that cdl or tissue. If dithp expression varies significantiy from the 

20 standard, the assay indicates the presence of the conditio^ disorder, or disease. Qualitative or 

quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based 
technologies or multiple-sample format technologies such as PGR, enzyme^linked immunosorbent assay 
(ELISA)-like, pin, or chip-based assays. 

The probes described above may also be used to monitor the progress of conditions, disorders, 

25 or diseases associated with abnormal levds of dithp expression, or to evaluate the efficacy of a 

particular tho-apeutic treatment. The candidate probe may be id^itified from the ditlq) tiiat are specific 
to a given human tissue and have not been observed in GenBank or other genome databases. Such a 
probe may be used in animal studies, preclinical tests, clinical trials, or in monitoring the treatment of 
an individual patient. In a typical process, standard expression is established by metiiods well known in 

30 the art for use as a basis of comparison, samples from patirats affected by the disorder or disease are 
combined with the probe to evaluate any deviation from die standard profile, and a therapeutic agent is 
administered and effects are monitored to generate a treatment profile. Efficacy is evaluated by 
determining whether tiie expression progresses toward or r^ums to the standard normal pattern. 
Treatment profiles may be generated over a period of several days or several momhs. Statistical 

3 5 metiiods wdl known to those skilled in the art may be use to determine the significance of such 
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therapeutic agoits. 

The polynucleotides are also useful for identifying individuals from minute biological samples, 
for example, by matching the RFLP pattern of a sample's DNA to that of an mdividual's DNA. Hie 
polynucleotides of the present invention can also be used to determine the actual base-by-base DNA 
sequence of sdected portions of an individual's gmime. These sequences can be used to prq)are PCR 
primers for amplifying and isolating sudi sdected DNA, which can then be sequenced. Using this 
techniquci an individual can be identified through a unique set of DNA sequences. Once a unique ID 
database is established for an individual, positive identification of that individual can be made from 
extrondy small tissue samples. 

In a particular aspect, oligonudeotide primers derived from the dithp of the invention may be 
used to detea single nucleotide polymorphisms (SNPs). SNPs are substitutions, msertions and 
ddetions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of SNP 
detection include, but are not limited to, single-stranded conformation polymorphism (SSCP) and 
fluorescent SSCP (fSSCP) methods. In SSCP, oligonudeotide primers derived from the polynucleotide 
sequences encoding DITHP are used to 2japhfy DNA using the polymerase chain reaction (PCR). The 
DNA may be derived, for exanq)le, from diseased or normal tissue, biopsy samples, bodily fluids, and 
the like. SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products 
in single-stranded form, and these differences are detectable using gd electrophoresis in non-denaturing 
gels. In fSCCP, the oligonucleotide primers are fluorescently labeled, which allows detection of the 
amplimers in high-throughput equipment such as DNA sequencing machines. Additionally, sequ^e 
database analysis methods, termed in silico SNP (isSNP), are capable of identifying polymorphisms by 
comparing the sequences of individual overlapping DNA fragments which assemble into a common 
cons^us sequence. These computer-based methods filter out sequence variations due to laboratory 
preparation of DNA and sequencing errors using statistical models and automated analyses of DNA 
sequence chromatograms. In the alternative, SNPs may be detected and charactCTized by mass 
spectrometry using, for example, the high throughput MASSARRAY syston (Sequenom, Inc., San 
Di^o CA). 

DNA-based identification techniques are critical in forensic technology. DNA sequences taken 
from very small biological sanq)les such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saliva, 
semen, etc., can be amplified using, e.g., PCR, to identify individuals. (See, e.g., Erlich, H. (1992) 
PCR Technology . Freenan and Co., New York, NY). Similarly, polynucleoUdes of thepresoit 
inv^on can be used as polymorphic markers. 

There is also a need for reagents capable of identifying the source of a particular tissue. 
Appropriate reagents can conqffise, for esxdmple, DNA probes ot primers prepared from the sequences 
of the present invention that are spedfic for particular tissues. Pands of such reagmts can identify 
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tissue by species and/or by organ typ& In a similar fashion, these reagents can be used to saeen tissue 
cultures for contaminatioa 

The polynucleotides of the present invention can also be used as molecular weight markers on 
nuddc add gds or Southern blots» as diagnostic probes for the presence of a spedfic mRNA in a 
5 particular cell type» in the aeation of subtracted cDN A libraries ^^ch aid m the discovery of novd 
polynucleotides, in sdection and synthesis of oligomers for attadmient to an array or other support, and 
as an antigen to elidt an immune response. 

Disease Model Svstems Using dithp 

10 The ditt^ of the inv^on or their mammalian hc^logs may be "knocked out" in an animal 

modd system using homologous recombination in embryonic stem (ES) cdls. Such techniques are wdl 
known in the art and are useful for the goieration of animal modds of human disease. (See, e.g., U.S. 
Patent Number 5,175,383 and U.S. Patent Number 5,767,337.) For exanq)le, mouse ES cdls, such as 
the mouse 1 29/S vJ cdl line, are derived from the early mouse embryo and grown in culture. The ES 

15 cdls are transformed with a vector containing the gene of interest disrupted by a marker g^ e.g., the 
neomycin phosphotransferase gene (neo; Capecchi, M.R. (1989) Sdence 244:1288-1292). The vector 
integrates into the corresponding region of the host genome by homologous reconibinatioa 
Alternativdy, homologous recombination takes place using the Cre-loxP system to knockout a geiie of 
interest in a tissue- or devdqjmental stage-specific manner (Marth, J.D. (1996) Clin. Invest. 97:1999- 

20 2002; Wagner. ICU. a al. (1997) Nuddc Adds Res. 25:4323-4330). Transformed ES cells are 

identified and microinjected into mouse cell blastocysts such as those from the C57BL/6 mouse strain. 
The blastocysts are surgically u-ansferred to pseudopregnant dams, and the resulting chimeric progaiy 
are genotyped and bred to produce h^ozygous or homozygous strains. Transgenic animals thus 
generated may be tested with potential therapeutic or toxic agents. 

2 5 The dithp of the invaUion may also be manipulated in vitro in ES cdls derived from human 

blastocysts. Human ES cells have the potential to differoitiate into at least dght sqjarate cell lineages 
including endodo-m, mesoderm, and ectodermal cdl types. These cdl lineages differentiate into, for 
example, neural cdls, hematopoietic lineages, and cardiomyocytes (Thomson, J. A. et al. (1998) Science 
282:1145-1147). 

30 The dithp of the invention can also be used to aeate "knockin" humanized animals (pigs) or 

transgenic animals (mice or rats) to modd human disease. With knockin tedmology, a region of dithp 
is injeaed into animal ES cdls, and the injected sequence integrates into the animal cell g^ome. 
Transformed cells are injected into blastulae, and the blastulae are implanted as described above. 
Transgenic progeny or inbred lines are studied and u-eated with potential pharmaceutical agents to 

3 5 obtain information on treatment of a human disease. Altemativdy, a mammal inbred to overexpress 
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ditt^}, resulting* &g., in the secretion of DITHP in its milk, may also serve as a convenient source of 
that protdn (Janne, J. et al. (1998) Biotedinol. Anna. Rev. 4:55-74). 

Soreaiing Assavs 

5 DITHP encoded by polynucleotides of the present invention may be used lo screen for 

molecules that bind to or are bound by the oicoded polypeptides. The binding of the polypq)tide and 
the molecule may activate (agonist), increase^ inhibit (antagonist), or decrease activity of the 
polypq)tide or the bound molecule. £xanq)les oS such molecules include antibodies, oligonucleotides, 
proteins (eg., receptors), or small molecules. 
10 Preferably, the molecule is closdy related to the natural ligand of the polypeptide, e.g., a ligand 

or jfragment thffeof, a natural substrate, or a structural or functional mimetic. (See, Coligan et al.. 

rdated to the natural rec^tor to which the polypq^tide binds, or to at least a fragment of the receptor, 
e.g., the active site. In dthsr case, the molecule can be rationally designed using known techniques. 

15 Preferably, the screening for these molecules involves producing appropriate cells which express the 
polypq)tide, either as a secreted protdn or on the cell membrane. Preferred cdls include cdls from 
mammals, yeast, Drosophila . or E. coli . Cells expressing the polypq}tide or cdl membrane fractions 
which contain the expressed polypq)tide are then contacted with a test compound and binding, 
stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed. 

20 An assay may simply test binding of a candidate compound to the polypq)tide, wherdn binding 

is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. AltCTnatively, 
the assay may assess binding in the presence of a labeled competitor. 

Additionally, the assay can be carried out using cell-free prqjarations, polypq)tide/molecule 
affixed to a solid support, chemical libraries, or natural product mixtures. The assay may also simply 

2 5 conq)rise the susps of mixing a candidate compound with a solution containing a polypq)tide, measuring 

pol>pq)tide/molecule activity or binding, and comparing the polypq)tide/molecule activity or binding to 
a standard. 

Preferably, an ELISA assay using, e.g., a monoclonal or polyclonal antibody, can measure 
polypeptide levd in a sanq^le. The antibody can measure polypq)tide levd by dther binding, directiy or 

3 0 indirectiy, to the polypeptide or by conqseting with the polypq)tide for a substrate. 

An of the above assays can be used in a diagnostic or prognostic context The molecules 
discovered using these assays can be used to treat disease or to bring about a particular result in a 
patient (e.g., blood vessd growth) by activating or mhibiting tiie polypeptide/molecule. Moreover, the 
assays can discover agents which may inhibit or enhance tiie production of the polypeptide from 
35 suitably manipulated cdls or tissues. 
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Transcript Imaging 

Another embcxliment rdates to the use of ditlq) to devdop a transcript image of a tissue (s- cell 
type. A transcript image is the collective pattern of gene expression by a particular tissue or cell type 
under given conditions and at a given time. This pattern of gene expression is defined by the number of 
expressed genes, their abundance, and their functioa Thus the dithp of the present invention may be 
used to devdop a transcript image of a tissue or cdl type by hybridizing, preferably in a microarray 
format, the dithp of the present invention to the totality of transcripts or reverse transcripts of a tissue 
or cell type. Hie resultant transcript image would provide a profile of g^ activity pertaining to human 
molecules for diagnostics and therqieutics. 

Transaipt images whidi profile dittq) expression may be generated using transcripts isolated 
from tissues, cell lines, biopsies, or other biological sanq)les. The transcript image may thus reflect 
dithp expression in vivo , as in the case of a tissue or biopsy sample* or in vitro , as in the case of a cell 
line. Transcript images may be used to profile dithp expression in distinct tissue types. This process 
can be used to determine the activity of human diagnostic and therapeutic molecules in a particular 
tissue type relative to this activity in a different tissue type. Transcript images may be used to generate 
a profile of dithp expression characteristic of diseased tissue. Transcript images of tissues before and 
after treatment may be used for diagnostic purposes, to monitor the progression of disease, and to 
monitor the efficacy of drug U-eatments for diseases which affect the activity of human diagnostic and 
therapeutic molecules. 

Transcript images which profile dithp expression may also be used in conjunction with in vitro 
modd systems and preclinical evaluation of pharmaceuticals. Transcript images of cdl lines can be 
used to assess the activity of human diagnostic and therapeutic molecules and/a* to identify cdl lines 
that lack or misregulate this activity. Such cell lines may then be treated with pharmaceutical agents, 
and a transcript image following treatment may mdicate the efficacy of these agents in restoring desired 
levds of this activity. A similar approach may be used to assess the toxicity of pharmaceutical agents 
as rrflected by undesirable changes in the activity of human diagnostic and therapeutic molecules. 
Candidate pharmaceutical agents may be evaluated by comparing their assodated transcript images 
with those of pharmaceutical agents of known effectiveness. 

Antisense Molecules 

The polynucleotides of the present invention are useful in antisense technology. Antisense 
technology ot therapy rdies on the modulation of repression of a target protein through the specific 
binding of an antisense sequoice to a target sequence ^coding the target protein or directing its 
©cpressioa (See, e.|., Agrawal, S.. ed. (1996) Antis^e Therapeutics . Humana Press Inc., Totawa 
NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3):171.178; Crooke, S.T. (1997) Adv. Pharmacol. 
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40:M9; Shanna, H.W. and R. Narayanan (1995) Bioessays 17(12):1055-1063; and Lavrosky, Y. et 
al. (1997) Biochem. MoL Med 62(1):1 1-22.) An amisense sequence is a polynucleotide sequence 
capable of specifically hybridizing to at least a portion of the targ^ sequence. Antis^e sequences 
bind to cdlular noRN A and/cr genomic DNA, affecting translation and/or transcriptioa Antisense 
5 sequences can be DNA, RNA, at nucleic add mimics and analogs. (See. e.g., Rossi, JJ. et al. (1991) 
AnlisenseRes. Dev. l(3):285-288; Lee, R. al, (1998) Biochemistry 37(3):900-1010; Pardridge, 
W.M. a al. (1995) Proc. Nad. Acad Sci. USA 92(12):5592-5596; and Nielsen. P. E. and Haaima, G. 
(1997) Chem. Soc. Rev. 96:73-78.) Typically, the binding which results in modulation of e3q)ression 
occurs through hybridization or binding of complem^itary base pairs. Antisense sequences can also 

10 bind to DNA dupldfes through specific interactions in the major groove of the double hdix. 

The polynucleotides of the present invaition and fragments thereof can be used as antis^e 
sequences to modify the expression of the polypq5tide encoded hy dtthp: The antisense sequences can 
be produced ex vivo , such as by using any of the ABI nucldc acid synthesizer series (PE Biosystems) 
or other automated systems known in the art. Antisense sequences can also be produced biologically, 

15 sudi as by transforming an appropriate host cdl with an expression vector containing the sequence of 
interest. (See, e.g.. Agrawal, supra.) 

In therapeutic use, any gene delivery system suitable for introduction of the antisense sequences 
into apprq)riate target cells can be used Antisense sequences can be ddivered intracdlularly in the 
form of an expression plasmid which, upon transcription, produces a sequence complementary to at 

20 least a portion of the cellular sequence aicoding the targ^ protdn. (See, e.g.. Slater. J.E., et al. (1998) 
J. Allergy Clin. Immunol. 102(3):469-475; and Scarilon. K.J., et al. (1995) 9(1 3): 1288- 1296.) 
Antisense sequences can also be introduced intracdlularly through the use of viral vectors, such as 
retrovirus and adeno-assodated virus vectors, (See, e,g., Miller, A.D. (1990) Blood 76:271; Ausubd. 
F.M. et al. (1995) Current Protocols in Molecular Biology . John Wiley & Sons, New York NY; Uckert, 

25 W. and W. Waltheril994) Pharmacol. Ther. 63(3):323-347.) Other gene ddivery medianisms include 
liposome-derived systems, artifidal viral envdq)es, and other systems known in the art (See, e.g., 
Rossi, J,J. (1995) Br. Mel Bull. 51(l):217-225; Boado, R.J. et al. (1998) J. Pharm. Sd. 87(1 1):1308- 
1315; and Morris, M.C. a al. (1997) Nuddc Adds Res. 25(14):2730-2736.) 

30 Expression 

In order to express a biologically active DITHP, the nucleotide sequences encoding DITHP or 
fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains 
the necessary danents for transcriptional and translational control of the inserted coding sequence in a 
suitable host. Methods vMch are wdl loiown to those skilled in the art may be used to construct 
35 expression vectors containing sequences aKxxiing DITHP and appropriate transcriptional and 
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translational control dements. These oietbods include in vitro recombinant DNA techniques, synthetic 
techniques* and in^yo gen^c recombinatioa (See, e.g., Sambrook, supra. Chapters 4, 8, 16, and 17; 
and Ausubd, supra . Chqiters 9, 10, 13, and 16.) 

A variety of e^ession vector/host systems may be utilized to contain and express sequences 
encoding DITHP. These include, but are not limited to, microorganisms such as bacteria transformed 
with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with 
yeast eiqxression vectors; insect cdl systems infected with viral expression veaors (e.g., baculovirus); 
plant cdl systems transformed with viral expression vectors (&g., cauliflower mosaic virus, CaMV, or 
tobacco mosaic virus. TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or 
animal (mammalian) cell systems, (See, e.g,. Sambrook, supra : Ausubd, 1995, supra . Van Heeke, G. 
and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509; Bitter, G.A. et al. (1987) Methods EnzymoL 
153:516-544; Sewer. C.A. et al. (1994) BioTrechnology 12:18M84; Engdhard. E.K, et al. (1994) 
Proc. Natl. Acad Sci. USA 91:3224-3227; Sandig. V. et al. (1996) Hum. Gene Ther. 7:1937-1945; 
Takamatsu. N. (1987) EMBO J. 6:307-311; Coruzzi. G. et al. (1984) EMBO J. 3:1671-1680; Broglie, 
R. €1 al. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cdl Differ. 17:85-105; 
The McGraw Hill Yearbook of Sdence and Technolopv (1992) McGraw Hill, New York NY, pp. 
191-196; Logan, J. and T. Shenk (1984) Proc. NaU. Acad. Sd. USA 81:3655-3659; and Harrington, 
J.J, d al. (1997) Nat Goiet. 15:345-355.) Expression vectors derived from retroviruses, adenoviruses, 
or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for delivery of nucleotide 
sequaices to the targeted organ, tissue, or cdl population. (See, e.g., Di Nicola. M. d al. (1998) 
CaiKcr Gea Ther. 5(6):350-356; Yu. M. d al., (1993) Proc. Nati. Acad Sd. USA 90(1 3);6340-6344; 
BuUer. R.M. et al. (1985) Nature 317(6040):813-815; McGregor. D.P. d al. (1994) Mol. Immunol. 
31(3):219-226; and Verma. LM. and N. Somia (1997) Nature 389:239-242.) The invention is not 
limited by the host cdl employed. 

For long term production of recombinant protdns in mammalian systems, stable expression of 
DITHP in cdl lines is preferred For exanq)le, sequ^ices encoding DITHP can be transformed into cdl 
lines using expression vectors which may contain viral origins of replication and/or endogenous 
expression demits and a sdectable marker gw on the same or on a sq)arate vector. Any number of 
sdection systons may be used to recover transformed cdl lines. (See. e.g.. Wiglo-, M. d al. (1977) 
Cdl 1 1:223-232; Lowy, L d al. (1980) Cdl 22:817-823.; Wigler, M. d al. (1980) Proc. NaO. Acad 
Sd. USA 77:3567-3570; Colbere-Garapin, F. d al. (1981) J. Mol. Biol. 150:1-14; Hartman, S.C. and 
R.C.Mulligan (1988) Proc. Natl. Acad Sd. USA 85:8047-8051; Rhodes. C.A. (1995) Methods Mol. 
Biol. 55:121-131.) 



Therapeutic Uses of dithp 



141 



wo 00/73509 



rCT/USOO/15404 



The ditlq) of the invention my be used for sonmtic or genh^ Geaetherapy 
may be performed to G) correct a gem^c deficiency (e.g., in the cases of severe combined 
immunodeficiency (SCID)-X1 disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et 
al. (2000) Scimce 288:669-672)» severe combined immunodeficiency syndrome associated with an 
5 inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1 995) Science 270:475-480; 
Bordignon. C. et al. (1995) Sdoice 270:470475). cystic fibrosis (Zabner. J. et al. (1993) CeO 75:207- 
216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum. Gene 
Therapy 6:667-703), thalassemias, familial hypercholesterolemia, and hemophilia resulting from Factor 
vm cx Factor IX deficiencies (Crystal, R.G. (1995) Science 270:404-410; Verma, I.M. and Somia, N. 

1 0 (1 997) Nature 389:239-242)), (ii) express a conditionally letiial gene product (e.g., in tiie case of 
cancers which result from unregulated cell proliferation), or (iii) express a protein wliidi affords 
protection against intracdlular parasites (eg., against human retroviruses, such as human 
immunodeficiency virus (HIV) (Baltimore. D. (1988) Nature 335:395-396; PoescWa, E. et al. (1996) 
Proc. Nati. Acad Sci. USA. 93:1 1395-1 1399). hepatitis B or C vims (HBV, HCV); ftmgal parasites, 

1 5 such as Pandida alhiraiw and Paracoccidioides brasiliensis : and protozoan parasites such as 
Plasmodium falciparum and Trvpanosoma cruziV In the case where a genetic deficiency in dithp 
expression or regulation causes disease, the expression of dithp from an appropriate pq)ulation of 
transduced cells may alleviate the clinical manifestations caused by the gaietic deficiency. 

In a further ^nbodiment of the invention, diseases or disorders caused by deficioicies in (Mip 

20 are treated by constructing mammalian expression veaors comprising ditiip and introducing these 
vectors by mechanical means into dithp-defident cells. Mechanical transfer technologies for use with 
cells in vivo or ex vitro include (i) direa DNA miaoinjection into individual cdls, (ii) ballistic gold 
particle ddivery, (iii) liposome-mediated transfection, (iv) recqjtor-mediated gene transfer, and (v) the 
use of DNA transposons (Morgan, R.A. and Anderson. W.F. (1993) Annu. Rev. Biochem. 62:191-217; 

25 Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and R&ipon, H. (1998) Curr. Opin. Biotechnol. 9:445- 
450). 

Expression vectors that may be effective for the expression of dithp include, but are not limited 
to, tiie PCDNA 3.1. EPITAG, PRCCMV2, PREP. PVAX veaors (Invitrogen, Carlsbad CA). 
PCMV-SCRIPT. PCMV-TAG, PEGSH/PERV (Stratagoie, La JoUa CA), and PTET-OFF. 

3 0 PTET-ON, PTRE2. PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA). The ditiip of tiie invention 
may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV). Rous 
sarcoma virus (RS V). SV40 virus, tiiymidine kinase (TK). or p-actin genes), (ii) an inducible promoter 
(e.g., tiie letracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. Acad. Sci. 
U.S.A 89:5547-5551; Gossen, M. &L al., (1995) Sdence 268:1766-1769; Rossi, F.M.V. and Blau, 

35 H.M. (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in tiie T-REX plasmid 
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(Invitrogen)); the ecdysone-inducible promoter (available in the plasmids PVGRXR and FIND; 
Invitrogen); the FKSOfi/rapamydn inducible promoter; or the RU486/mifq}ristone inducible promoter 
(Rossi, F.M. V. and Blau, H.M. supra) ), or (ill) a tissue-specific promoter or the native promoter of the 
endogenous gene encoding DITHP from a normal individual. 
5 Commercially avaflable liposome transformation kits (e.g., the PERFECT LIPID 

TRANSFECnON KIT, available Itom Invitrogra) allow one with ordinary skin in the art to deliver 
polynucleotides to target cells in culture and require minimal effcut to optimize experimoital 
parameters. In the altenuitive,transforniation is performed using the caldumphosp^ 
(Graham, F.L. and Eb, A.J. (1973) Virology 52:456-467), or by declroporation (Neumann, E. et al. 

10 (1982)£MBOJ. 1:841-845). llieintroductionof DNA to priniarycdls requires modification of these 
standardized mammalian transfection protocols. 

In anc^iCr cmbodimait of the invention, diseases or disorders caused by goi^c defects with 
respect to dithp expression are treated by constructing a retrovirus vector consisting of (i) the dithp of 
the invention under the control of an indq)endait promoter or the retrovirus long terminal repeat (LTR) 

15 promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive dranent (RR£) along with 
additional retrovirus cif -acting RNA sequences and coding sequences required for effident vector 
propagation Retrovirus vectOTS (e.g., PFB and PFBNEO) are commerdally available (Stratagene) and 
are based onpubUshed data (Riviere, L et al. (1995) Proc. Nati. Acad. Sd. U.S.A. 92:6733-6737), 
incorporated by reference hereia The vector is propagated in an appropriate vectOT producing cdl line 

20 (VPCL) that expresses an envelope gene with a tropism for receptors on the target cells or a 

promiscuous envdopeprotdn such as VSVg (Armentano, D. et al. (1987) J. Virol. 61:1647-1650; 
BoKler. M.A. et al, (1987) J. Virol. 61:1639-1646; Adam, M.A. and Miller, A.D. (1988) J. Virol. 
62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey, R. et al. (1998) J. Virol. 
72:9873-9880). U.S. Patent Number 5,910.434 to Rigg ("Method for obtaining r^ovirus packaging 

25 cell lines producing tiigh transducing efflci^cy retroviral supernatant") discloses a method for 
obtaining retrovirus paclcaging cdl lines and is hereby incOTporated by reference. Propagation of 
retrovirus vectors, transduction of a population of cells (e.g., CD4'' T-cdls), and the return of 
transduced cells to a patient are procedures wdl known to po'sons skilled in the art of gene therapy and 
have been well documented (Ranga, U. et al. (1997) J. Virol. 71 :7020-7029; Bauo-, G. et al. (1997) 

30 Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71 :4707-4716; Ranga, U. et al. (1998) Proc. 
Natt. Acad Sci. U.S.A. 95:1201-1206; Su. L. (1997) Blood 89:2283-2290). 

In the alternative, an adenovinis-based gene therapy delivery system is used to deliver dithp to 
cdls which have one or more genetic abnormalities with respect to the expression of dithp. The 
construction and packaging of adenovirus-based veaors are well known to those with ordinary skill in 

35 the art, Rq)lication defective adenovirus vectors have proven to be versatile for importing genes 
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encoding inuminoregulatory proteins into intact islets in the pancreas (Csete, M.£. et al. (199S) 
Transplantation 27:263-268). Potentially useliil ad^viral vectors are described in U.S. Patoit 
Number 5,707,618 to Anneotano ("Adaiovirus vectcffs for gaie therapy"), hereby incaporated by 
reference. For adenoviral veaors, see also Antinozzi, PA et al. (1999) Annu. Rev. Nutr. 19:51 1-544 
5 and Verma, I.M. and Somia, N. (1997) Nature 18:389:239-242. both incorporated by reference herein. 

In another alternative, a herpes-based, goie therapy delivery system is used to ddiver dithp to 
target cells v^ch have one or more genetic abnonnalities with respect to the expression of dithp. The 
use of herpes sinq)lex virus (HSV)-based vectors may be especially valuable for introducing dithp to 
cdls of the centra] nervous syston, for which HS V has a trqpism. The construction and packaging of 

10 herpes-based vectors are wdl known to those with ordinary skill in the art A rq)lication-competent 
herpes implex virus (HSV) type 1-based vector has been used to ddiver a rqxsrter gene to the eyes of 
primates (Liu, X. et al. (1999) Exp. Eye Res. 169:385-395). Hie construction of a HSV-i wus vector 
has also been disclosed in detail in U.S. Patent Number 5,804,41 3 to DeLuca ("Herpes simplex virus 
strains for g^ transfer"), which is hereby incorporated by reference. U.S. Pat^ Number 5,804,41 3 

1 5 teaches the use of recombinant HSV d92 which consists of a genome containing at least one exogenous 
gene to be transferred to a cdl under the control of the apprqiriate promoter for purposes including 
human gene therapy. Also taught by this patent are the construction and use of recombinant HSV 
strains deleted for ICP4. ICP27 and ICP22. Fot HSV vectors, see also Coins, W. F. et al. 1999 J. 
Virol. 73:519-532 andXu, H. et al,. (1994) Dev. Biol. 163:152-161, hereby incorporated by reference. 

20 The manipulation of cloned hapesvirus sequences, the generation of recombinant virus following the 
transfection of multiple plasmids contaming different segments of the large herpesvirus genomes, the 
growth and propagation of herpesvirus, and the infection of cells with herpesvirus are techniques wdl 
known to those of ordinary skill in the art. 

In another alternative, an alphavirus (positive, single-stranded RNA virus) vector is used to 

25 deliver dithp to target cells. The biology of the prototypic alphavirus. Semliki Forest Virus (SFV). has 
beai studied extensivdy and gene transfo- vectors have been based on the SFV genome (Garoff, H. and 
Li, K-J. (1998) Curr. Opia Biotech. 9:464-469). During alphavirus RNA replication, a subgenomic 
RNA is generated that normally encodes the viral capsid protdns. This subgenomic RNA rqjlicates to 
higho- levels than the full-length genomic RNA, resulting in the ovaproduction of capsid protdns 

30 rdative to the viral protdns with enzymatic activity (e.g., protease and polymerase). Similarly, 

inserting dithp into the alphavirus genome in place of the capsid-coding region results in the production 
of a large number of dithp RNAs and the synthesis of high levds of DITHP in veaor transduced cdls. 
While alphavirus infection is typically associated with cell lysis within a few days, the ability to 
establish a persistent infection in hamster normal kidney cdls (BHK-21) with a variant of Sindbis virus 

3 5 (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the needs of the gene 
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therapy ^licaticm (Dryga, S.A. et al. (1997) Virology 228:74-83). The wide host range of 
alphaviruseswiflanow the introduction of DITHP into a vari^ Thespedfic 
transduction of a subset of cdls in a population may require the sorting of cdls pxiot to transductioa 
The methods of manipulating infectious cDNA clones of alphavinises, performing alphavinis cDNA 
5 and RNA transfections. and performing alphavinis infections, are well known to those with ordinary 
skin in the art 

Antibodies 

Anti-DITHP antibodies may be used to analyze protein expression levels. Such antibodies 

1 0 include, but are not limited to, polyclonal, monoclonal, dumeric, sing}e chain, and Fab fragments. For 
descriptions of and protocols of antibody technologies, see, e.g.. Pound, J.D. (1998) Immunochemical 
Protocols . Humana Press, Totowa, NJ. 

The amino acid sequence encoded by the ditl^ of the Sequence Listing may be analyzed by 
appropriate software (e.g.. LASERGENE NAVIGATOR software, DNASTAR) to determine regions 

15 of high immunogenidty. The optimal sequences for immunization are sdected from the C-terminus, the 
N-tenninus, and those intervening, hydrophilic regions of the polypqptide which are likdy to be exposed 
to the external oivironment when the pdtypcptide is in its natural conformation. Analysis used to select 
appropriate epitopes is also described by Ausubd (1997, supra . Chapter 1 1.7). Pq)tides used fen- 
antibody induction do not need to have biological activity; however, they must be antigenic. Pq)tides 

2 0 used to induce specific antibodies may have an amino add sequence consisting of at five amino acids, 
preferably at least 10 amino acids, and most preferably 15 amino adds. A peptide which mimics an 
antigenic fragment of the natural polypqjtide may be fused with another protdn such as keyhole limpet 
cyanin (KLH; Sigma, St. Louis MO) for antibody production. A pq)tide encon^)assing an anligwiic 
region may be expressed from a dithp, synthesized as described above, or purified from human cdls. 

25 Procedures wdl known in the art may be used for the production of antibodies. Various hosts 

induding mice, goats, and rabbits, may be immunized by injection with a pq)tide. Depending on the 
host species, various adjuvants may be used to increase immunological response. 

In one procedure, pq)tides about 15 residues in length may be synthesized using an ABI 431 A 
pqjtide synthesizer (PE BiosystOTis) using fmoc-chonistry and coupled to KLH (Sigma) by reaction 

30 with M-maldmidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra) . Rabbits are 

immunized with the pqjtide-KLH complex in complete Freund's adjuvant. The resulting antisera are 
tested for anlipq)tide activity by binding the pqjtide to plastic, blocking with 1 % bovine serum albumin 
(BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti-rabbit IgG. 
Antisera with antipqjtide activity are tested for anti-DITHP activity using protocols wdl known in the 

35 art, including ELISA, radioimmunoassay (RI A), and immunoblotting. 
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In another procedure, isolated and purified peptide may be used to immunize mice (about 100 
of peptide) or rabbits (about 1 mg of pepddt). Subsequently^ the peptide is radidodinated and used 
to screen the immunized animals' B-lymphocytes for production of antipq)tide antibodies. Positive 
ceUs are then used to produce hybndomas usmg standard techniques. About 20 mg of pqHide is 
sufitdoit for labeling and saeening several thousand clones. Hybridomas of interest are d^ected by 
screening with radioiodinated pepade to identiiy those iiisions producing peptide-specific monoclonal 
antibody. In a typical protocol, wdls of a multi-wdl plate (FAST, Becton-Dickinson, Palo Alto. CA) 
are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-spedes IgG) antibodies at 
lOmg/ml. The coated wells are blocked with 1%BSA and washed and exposed to supmatantsf^ 
hybridomas. After incubation, the wdls are exposed to radiolabded pqitide at 1 mg/nd. 

Gones produdng antibodies bind a quantity of labded peptide that is detectable above 
background. Such clones are expanded and subjected to 2 cydes of cloning. Cloned hybridomas are 
injected into pristane-treated mice to produce asdtes, and monoclonal antibody is purified torn the 
ascitic fluid by affinity chromatography on protdn A (Amersham Pharmada Biotech). Several 
procedures for the production of monoclonal antibodies, including in vitro production, are described in 
Pound (su pra) . Monoclonal antibodies with antipeptide activity are tested for anti-DITHP activity 
using protocols well known in the art, including ELIS A, RIA, and immunoblotting. 

Antibody fi-agments containing specific binding sites for an qjitope may also be generated. For 
exan^le, such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin 
digestion of the antibody molecule, and the Fab fragments goierated by reducing the disulfide bridges of 
the F(ab')2 fragments. Altoiiatively, construction of Fab expression libraries in filamentous 
bacteriophage allows rapid and easy idaitification of monoclonal fragments with desired specificity 
(Pound, supra . Chaps. 45-47). Antibodies generated against polypqjtide racoded by dithp can be used 
to purify and characterize full-length DITHP protdn and its activity, binding partners, etc. 

Assays Using Antibodies 

Anti-DITHP antibodies may be used in assays to quantify the amount of DITHP found in a 
particular human cell. Such assays include methods utilizing the antibody and a labd to detect 
expression levd uncter normal or disease conditions. The pq^tides and antibodies of the invention may 
be used with or without modification or labded by joining th^, dther coval^tly or noncovalently, 
with a rq)orter molecule. 

Protocols fat detecting and measuring protein expression using dther polyclonal or monoclonal 
antibodies are wdl known in the art. Examples inchide ELIS A, RI A, and fluorescent activated cdl 
sorting (FACS). Such immunoassays typically involve the formation of complexes between the DITHP 
and its spedfic antibody and the measurement of sudi conq}lexes. These and other assays are described 
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Without further dahoration, it is bdieved that one skiDed in the art can, using the preceding 
description, utilize the present invention to its liiUest extent Hie following preferred specific 
embodiments are, therefore, to be construed as merdy illustrative, and not limitative of the ronainder of 
the disdosure in any way wtiatsoever. 

The disdosures of all patents, s^lications, and publications mentioned above and bdow, in 
particular U.S. Ser. No. 60/137,109, U.S. Ser. No. 60/137,337, U.S. Ser. No. 60/137,258. U.S. Ser. 
No. 60/137,260, U.S. Ser. No. 60/137,113, U.S. Ser. No. 60/137,161, U.S. Ser. No. 60/137,417, U.S. 
Ser. No. 60/137,259. U.S. Ser. No. 60/137,396, U.S. Ser. No. 60/137,114, U.S. Ser. No. 60/137,173, 
U.S. Ser. No. 60/137.411, U.S. Ser. No. 60/147,436. U.S. Ser. No. 60/147,549. U.S. Ser. No. 
60/147.377, U.S. Ser. No. 60/147.527, U.S. Ser. No. 60/147.520, U.S. Ser. No. 60/147,536, U.S. Ser. 
No. 60/147.530. U.S. Ser. No. 60/147.547, U.S. Ser, No. 60/147,824, U.S. Ser. No. 60/147.541, U.S. 
Ser. No. 60/147.542. and U.S. Ser. No. 60/147.500, are hereby expressly incorporated by reference. 

EXAMPLES 

L Construction of cDN A Libraries 

RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from 
various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others 
were homogenized and lysed in pheml or in a suitable mixture of denaturants, such as TRIZOL (Life 
Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates 
were centriftiged over CsCl cushions or extracted with chloroform. RNA was precipitated with dther 
isopropanol or sodium acetate and ethanol, or by other routine methods. 

Phenol extraction and predpitation of RNA wctc repeated as necessary to increase RNA 
purity. In most cases, RNA was treated with DNase. For most libraries, poly(A+) RNA was isolated 
using oligo d(T)-coupled paramagnetic particles (Promega Corporation (Promega), Madison WI). 
OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valenda CA). or an OLIGOTEX mRNA 
purification kit (QIAGEN). Altemativdy, RNA was isolated directly from tissue lysates using other 
RNA isolation kits, e.g., the POLY(A)PURE mRNA purificaUon kit (Arabion. Inc.. Austin TX). 

In some cases, Stratagene was provided with RNA and constructed the corresponding cDNA 
libraries. Otherwise, cDNA was synthesized and cDN A libraries were consmicted with the UNIZAP 
vector systan (Stratagene Cloning Systems, Inc. (Stratagene). La Jolla C A) or SUPERSCRIPT 
plasmid systm (Life Technologies), using the reconunaided procedures or similar methods known in 
the art. (See, e.g., Ausubd, 1997. sugra. Chapters 5.1 through 6.6.) Reverse transcripUon was 
initiated using oligo d(T) or random primers. Synthetic oligonudeotide adapters were ligated to double 
stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or enzymes. Fcr 
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most libraries, the cDNA was size-sdected (300-1000 bp) using SEPHACRYL SIOOO, SEPHAROSE 
CL2B, or SEPHAROSE CL4B cdumn chromatQgraphy (Amersham Pbarmacia Biotech) or 
preparative agarose gel dectrqphoresis. cDNAs were ligated into compatible restriction enzyme sites of 
the polylinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), pSPORTl plasmid 
5 (LifeTeduudogtes), orpINCY(Incyte). Recombinant plasmids were transformed into comp^entE, 
ccOi cdls including XL1-Blue» XLl-BlueMRF, or SOLR from Stratagoie or DH5a, DHIOB, or 
DectroMAX DHIOB from life Technologies. 

IL Isolation of cDNA Genes 

1 0 Plasmids were recovered from host cdls by in vivo excision using the UNIZAP vector system 

(Stratagene) or by cdl lysis. Plasmids were purified using at least one of the following: the Magic or 
WIZARD Miniprq)s DNA purification systCT!-(Promega); the AGTC Miniprq) purification kit (Edge 
BioSystems, Gaithersburg MD); and the QIAWELL 8» QIAWELL 8 Plus, and QIAWELL 8 Ultra 
plasmid purification systans or the R.E. A.L. PREP 96 plasmid purification kit (QIAGEN). Following 

15 precipitation, plasmids were resuspended in 0. 1 ml of distilled water and stored, with or without 
lyophilization, at 4''C. 

Alternativdy, plasmid DNA was amplified from host cell lysates using direct link PGR in a 
high-througlq)ut formaL (Rao. V.B. (1994) Anal. Biochcm, 216:1-14.) Host cdl lysis and thermal 
cycling steps wctc carried out in a single reaction mixture. SaiTq>les wae processed and stored in 384- 
2 0 wdl plates, and the concentration of amplified plasmid DNA was quantified fluorometrically using 
PICOGREEN dye (Molecular Probes. Inc. (Molecular Probes), Eugene OR) and a FLUOROSKAN II 
fluoresc^ce scanner (Labsystems Oy, Hdsinki, Finland). 

III. Sequencing and Analysis 

2 5 cDN A sequencing reactions were processed using standard m^ods or high-througlput 

instrumentation such as the ABI CATALYST 800 thermal cycler (PE Biosystems) or the PTC-200 
thoroal cyclo- (MJ Research) in conjunction with the HYDRA raicrodisp^er (Robbins Scientific 
Corp., Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton). cDNA sequencing 
reactions were prq)ared using reag^ts provided by Amersham Pharmacia Biotech or supplied in ABI 
30 sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE 
Biosystems). Electrophoretic sq)aration of cDNA sequendng reactions and detection of labded 
polynucleotides were carried out using the MEGABACE 1000 DNA sequendng system (Molecular 
Dynamics); the ABI PRISM 373 or 377 sequendng system (PE Biosystems) in conjunction with 
standard ABI protocols and base calling software; or other sequence analysis systems known in the art 

3 5 Reading frames within the cDN A sequences were identified using standard methods (reviewed in 
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Ausubd, 1997, supra . Chapter 7.7). Some of the cDNA sequences were sdecled for extension using 
the techniques disclosed in Exan9>le Vm. 

IV. Assonbly and Analysis of Sequences 
5 Conqx)nent sequences from chromatQgrams were subject to PHRED analysis and assigned a 

quality score, llie sequences having at least a required quality score were subject to various p^^ 
processing editing pathways to diminate, e.g., low quality 3' ends, vector aiKl linker sequences, polyA 
tails, Alu repeats, ndtodiondrial and ribosomal sequences, bacterial contamination sequences, and 
sequences smaller than SO base pairs. In particular, low-information sequences and rq)etitive dements 
10 (e.g., dinucleotide rqieats, Alu rqieats, etc.) were rqplaced by "n's", or masked, to prevent spurious 
matches. 

ProcKsed sequences were then subjea to assembly procedures in which the sequences were 
assigned to gene bins (bins). Each sequence could only bdong to one bia Sequences in each gene bin 
were assembled to produce consoisus sequences (tenq)lates). Subsequent new sequences were added to 

15 existing bins using BLASTn(v.l.4WashU) and CROSSMATCH. Candidate pairs were idoitified as 
all BLAST hits having a quality score greater than or equal to 150. Alignments of at least 82% local 
id^tity were accq)ted into the bin. The component sequoices from each bin were assembled using a 
version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP 
PHRAP. The orientation (sense or antisense) of each assembled tenq)late was determined based on the 

20 number and orientation of its component sequences. Template sequ^ces as disclosed in the sequence 
listing cOTrespond to sense strand sequaices (the "forward" reading frames), to the best determination. 
The complementary (antisense) strands are inherently disclosed herda The conq)onent sequaices 
which were used to assanble each tenplate consensus sequence are listed in Table 4, along with their 
positions along the template nucleotide sequences. 

25 Bins WQ-e compared against each other and those having local similarity of at least 82% wctc 

combined and reassembled Reassonbled bins having tonplates of insuffident overlap (less than 95% 
local ideaitity) were re-split. Assembled templates were also subject to analysis by STITCHER/EXON 
MAPPER algorithms which analyze the probabilities of the presence of splice variants, alternatively 
spliced exons, splice junctions, differential expression of alternative spliced genes across tissue types or 

3 0 disease states, etc. These resulting bins were subject to several rounds of the above assembly 
procedures. 

Once gene bins were goierated based upon sequence alignments, bins were clone joined based 
upon clone infmnation. If the 5' sequence of one clone was present in one bin and the 3* sequence from 
the same clone was'present in a different bin. it was likely that the two bins actually bdonged together 
35 in a single bin. The resulting combined bins und^ent assembly procedures to regoierate the 
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consensus sequences. 

The final assembled tenq)lates were subsequently annotated using the following procedure. 
Taiq>late sequoices were analyzed using BLASTn (v2.0, NCBI) versus gbpri (GenBank version 116). 
"Hits" were defined as an enact match having from 95% local identity over 200 base pairs through 
5 100% local idoidty over 100 base pairs, or a homolog matdi having an E-vahie, i.e. a probability 
score, of ^ 1 X 10*1 . The tuts were subject to fi^meshift FASTx versus GENPEPT (GenBank version 
116). (See Table S). In this analysis, a homolog match was defined as having an E-value of ^ 1 x 10^1 
The assembly method used above was described in "System and Methods for Analyzing Biomolecular 
Sequences/' U.S.S.N. 09/276,534, filed March 25, 1999, and the LIFESEQ Gold user manual (Incyte) 

1 0 both incorporated by reference herda 

Following assembly, tenplate sequences were subjected to motif, BLAST, and Junctional 
analyses, and cat^oized in protdn hierarchies using methods described in, e g., "Database System 
Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 
08/812,290. filed March 6. 1997; "Relational Database for Storing Biomolecule Inf^mation," 

15 U.S.S.N. 08/947,845, filed Oaober 9, 1997; *Project-Based Full-Length Biomolecular Sequoice 
Database," U.S.S.N. 08/811, 758, filed March 6. 1997; and "Relational Database and System for 
Storing Information Rdating to Biomolecular Sequences." U.S.S.N. 09/034,807, filed March 4, 1998, 
all of which are incorporated by reference herda 

The template sequences wctc ftulher analyzed by translating each template in all three forward 

20 reading frames and searching each translation against the Pfam database of hidden Markov model- 
based protdn families and domains using the HMMER software package (available to the public fi'om 
Washington University School of Medidne, St Louis MO). Regions of templates which, when 
translated, contain similarity to Pfam cons^us sequences are rqxHted in Table 2, along with 
descriptions of Pfam protdn domains and families. Only those Pfam hits with an E-value of ^ 1 x 10'^ 

25 are repeated. (See also World Wide Web site http7/pfam. wusU.edu/ for d^ed descriptions of Pfam 
protdn domains and families.) 

Additionally, the tenq^late sequences were translated in all three forward reading frames, and 
eadi translation was seardied against hidden Markov nxxlds for signal peptide and transmembrane 
domains using the HMMER software package. Construction of hiddra Markov nKxlels and Uidr usage 

30 in sequence analysis has bessn desoibed. (See, for example, Eddy, S.R. (1996) C^urr. Opin. Str. Biol. 
6:361-365.) RegiCHis of tenq)lates which, when translated, contain similarity to signal peptide or 
transm^brane domain consensus sequences are reported in Table 3. Oidy tiiose signal pqptide or 
transmembranehits withacutofi'scOTeof 11 bits or greater are rqmed. Acutoff scoreof 11 bits or 
greater corresponds to at least about 91-94% uiie-positives in signal pqjtide prediction, and at least 

35 about 75% true-positives in transmembrane (tomain prediction. 
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The results of HMMER analysis as repcHted in Tables 2 and 3 may suppot the results of 
BLAST analysis as rqxHted in Table 1 or may suggest alternative or additional properties of template- 
encoded polypq)tides not previously uncovered by BLAST or other analyses. 

Tenqjlate sequences are fiirther analyzed using the bioinformatics tools listed in Table 5, or 
using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi 
Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Template 
sequ^ices may be further queried against public databases such as the GenBank rodent, mammalian, 
vertebrate, prokaryote, and eukaryote databases. 

V. Analysis of Polynucleotide Expression 

Narthem analysis is a laboratory technique used to detect the presence of a transcript of a gene 
and involve the hybridization of a labeled nucleotide sequence to a membrane on which RNA_s from a 
particular cdl type or tissue have been bound (See, e.g.. Sambrook, supra , ch. 7; Ausubd, 1995. 
supra , ch. 4 and 16.) 

Analogous computer techniques applying BLAST were used to search for identical or rdated 
molecules in cDNA databases such as GenBank or LIFESEQ Qncyte Pharmaceuticals). This analysis 
is much faster than multiple manbrane-based hybridizations. In addition, the sensitivity of the 
computer search can be modified to determine whether any particular match is categorized as exact or 
similar. The basis of the search is the produa score, which is defined as: 

BLAST Score x Percent Identity 
5 X minimum {l^gth(Seq. 1), length(Seq. 2)} 

The product score takes into account both the degree of similarity between two sequences and the length 
of the sequence match. The product score is a normalized value between 0 and 100, and is calculated 
as follows: the BLAST score is multiplied by the percM nucleotide identity and the product is divided 
by (S times the length of the shorter of the two sequences). The BLAST score is calculated by 
assigning a score of +5 fcH- every base that matches in a high-scoring segment pair (HSP), and -4 for 
every mismatch. Two sequences may share more than one HSP (sq}arated by gaps). If there is more 
than one HSP, then the pair with the highest BLAST score is used to calculate the product score. The 
product score rq)resents a balance between fractional overlap and quality in a BLAST alignment. For 
example, a product score of 1 00 is produced only for 100% identity over the entire length of the shorter 
of the two sequences being compared. A product score of 70 is produced either by 1 00% idoitity and 
70% overlap at one end, ot by 88% identity and 100% overlap at the other. A product score of 50 is 
produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap. 
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VI. Tissue Distribution Profiling 

A tissue distribution profile is determined for each tenq)late by con^iling the cDNA library 
tissue classifications of its conqion^ cDN A sequences. Each componoit sequence, is derived from a 
5 cDN A library constructed from a human tissue. Each human tissue is classified into one of the 

following categories: cardiovascular system; connective tissue; digestive system; embryonic structures; 
endooine system; exoaine glands; genitalia, female; g^iitalia, male; germ cells; h^c and immune 
system; liver; musculoskdetal systm; nervous system; pancreas; respiratory syston; sense organs; 
skin; stomatognathic system; unclassified/mixed; or urinary tract. Template sequences, component 
10 sequences, and cDNA library/tissue information are found in the LIFESEQ GOLD database (Incyte 
Genomics, Palo Alto CA). 

VII. Transcript Image Analysis 

Transcript images are generated as described in Sdlhamer et al., "Comparative Gene 
1 5 Transcript Analysis," U.S. Patent Number 5,840,484, incorporated herein by refer^ice. 

VIII. Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA 

Oligonucleotide prima's designed using a dithp of the Sequence Listing are used to extend the 
nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the other 
20 primer, to initiate 3* extension of the template. The initial primers may be designed using OLIGO 4.06 
software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or anodier appropriate 
program, to be about 22 to 30 nucleotides in l^gth, to have a GC cont^ of about 50% or more, and to 
anneal to the target sequence at tenqyeratures of about 68 ""C to about 72*'C. Any su-etdi of nucleotides 
which would result in hairpin sttuctures and primer-primer dimerizations are avoided. Sdeaed human 

2 5 cDNA libraries are used to extend the sequence. If more than one extension is necessary or desired, 

additional or nested sets of primers are designed. 

High fidelity anq)lification is Obtained by PGR using methods well known in the art PGR is 
performed in 96-wdl plates using the PTC-200 thermal cycler (MJ Researdi). The reaction nux 
contains DNA template. 200 nmol of each primer, reaction buffer containing Mg^\ (NH4)2S04, and 6- 

3 0 mercapto^hanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONG ASE ^izyme (Life 

Technologies), and Pfti DNA polymerase (Stratagene), with the following parameters for primer pair 
PCI A and PCI B: Step 1: 94T, 3 min; Step 2: 94"C, 15 sec; Step 3: 60T, 1 min; Step 4: 68*C, 2 
min; Step 5: Steps 2, 3, and 4 rq)eated 20 times; Sxep 6: 68''C, S min; Step 7: storage at 4*'C. In the 
alternative, the param^ers for primer pair T7 and SK+ are as follows: Step 1 : 94 ""C, 3 min; Step 2: 
35 94T, 15 sec; Step 3: 57X, 1 min; Step 4: 68 T, 2 min; Sissp 5: Steps 2, 3. and 4 repeated 20 times; 
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Step 6: 5 min; Stq) 7: steage at 4'C. 

The CQDceiitration of DNA in eadi wdl is d^ennined by disp^ing 100 ^1 PICOGREEN 
quantitation reagent (0.25% (v/v); Molecular Probes) dissolved in IX Tris-EDTA (TE) and 0.S ^1 of 
undiluted PCR product into each wdl of an opaque fluorimeter plate (Coming Incorp(»-ated (C(»iiing), 
5 Coming NY)» allowing the DNA to bind to the reagoit. The plate is scanned in a FLUOROSKAN 11 
(Labsystems Oy) to measure the fluorescence of the sample and to quantify the concentration of DNA. 
A S ^1 to 1 0 ^1 aliquot of the reaction mixture is analyzed by dedrpphoresis on a 1 % agarose mini-gd 
to determine which reactions are successAd in extending the sequence. 

The extended nudeotides are desalted and oonc^ated, transferred to 384-wdl plates, 

1 0 digested with CvLFI cholera virus endonuclease (Molecular Biology Researdi, Madison WI), and 
sonicated or sheared prior to rdigation into pUC 1 8 vector (Amersham Pharmacia Biotech). For 
shotgun sequencing, the digested nucleotides are sq)arated on low ccncemr alion (0.6 to 0.8%) agarose 
gds, firagmems are ecdsed, and agar digested with AGAR ACE (Promega). Extended clones are 
religated using T4 ligase (New England Biolabs, Inc., Bevaly MA) into pUC 18 vector (Amersham 

1 5 Pharmada Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, 
and transfected into competent E. coli cells. Transformed cdls are sdected on antibiotic-containing 
media, individual colonies are picked and cultured overnight at 37 ^'C in 384-wdl plates in LB/2x 
carbenidllin liquid media. 

The cdls are lysed, and DNA is anq)lified by PCR using Taq DNA polymerase (Amersham 

2 0 Pharmada Biotech) and Pfu DNA polymerase (Stratagaie) with the following parameters: Step 1 : 

94*C, 3 min; Step 2: 94'C, 15 sec; Stqp 3: 60T, 1 min; Step 4: 72T, 2 min; Step 5: steps 2, 3, and 4 
repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4T. DNA is quantified by PICOGREEN 
reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reamplified 
using the same conditions as described above. Samples are diluted with 20% dimethysulfoxide (1 :2, 

25 v/v), and sequenced using DYENAMIC energy transfer sequendng prima-s and the DYENAMIC 
DIRECT kit (Amersham Pharmada Biotech) or the ABI PRISM BIGDYE Terminator cycle 
sequencing ready reaction kit (PE Biosystons). 

In like manner, the ditl^) is used to obtain regulatory sequences (promota*s, introns, and 
^ihancers) using the procedure above» oligonucleotides designed for such extension, and an appropriate 

30 genomic library. 



IX. Labeling of Probes and Southern Hybridization Analyses 

Hybridization probes derived from the dithp of the Sequence Listing are employed for 
screenii^ cDNAs, mRNAs, or genomic DNA The labding of probe nucleotides between 100 and 
3 5 1000 nucleotides in length is specifically described, but essentially the same procedure may be used 
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With larger cDNA fragments. Probe sequences are labeled at room ten^ature for 30 minutes using a 
T4 polynucleotide kinase, y^-ATP, and 0.5X OncrPhor-An Plus (Amersham Pharmacia Biotech) 
buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The 
probe mixture is diluted to 10^ dtpm/^g^ml hybridization buffer and used m a typical m^rane-based 
hybridization analysis. 

The DNA is digested with a restriction endonuclease such as Eco RV and is dectrpphoresed 
tiirough a 0.7% agarose ge]. The DNA fragments are transferred from the agarose to nylon monbrane 
(NYTRAN Plus, Sdilddicr & Schudl, Inc., Keene NH) using procedures spedfied by tiie 
manufacturer of the msnhrane. Prehybridization is carried out for three or more hours at 68 ""C, and 
hybridization is carried out overnight at 68 ""C. To ronove non-specific signals, blots are sequentially 
washed at room teiiq)erature under increasingly stringoit conditions, up to 0. Ix saline sodium dtrate 
(SSC) and 0.5% sodium dodecyl sulfate. Aiter the blots arc placed in a PHOSPHORiMAGER cassette 
(Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of standard and 
experimental lanes are compared. Essratially the same procedure is employed when screoii^g RNA. 

X. Chromosome Mapping of dithp 

The cDNA sequences which were used to asswnble SEQ ID NO:l-52 are compared with 
sequences from the Incyte LIFESEQ database and public domain databases using BLAST and otiier 
inq)lementations of the Smith- Waterman algorithm. Sequences from these databases that match SEQ 
ID NO: 1-52 are assembled into clusters of contiguous and overlapping sequences using assembly 
algmtiims such as PHRAP (Table 5). Radiation hybrid and genetic mapping data available from 
public resources such as the Stanfwd Human Genome Center (SHGC), Whitehead InstiUite for Genome 
Research (WIGR), and G^idthon are used to deto-mine if any of tiie clustered sequences have been 
previously mapped Inclusion of a mapped sequence in a cluster will result in tiie assignment of all 
sequaices of tiiat cluster, including its particular SEQ ID NO:, to tiiat map location. The genetic map 
locations of SEQ ID NO: 1 -52 are described as ranges, or intervals, of human chromosomes. The map 
position of an interval, in centiMcwgans, is measured relative to the terminus of tiie chromosome's p- 
arm. (The centiMorgan (cM) is a unit of measurement based on recombination frequencies between 
diromosomal markers. On average, 1 cM is roughly equivalait to 1 megabase (Mb) of DNA in 
humans, altiwugh this can vary widely due to hot and cold spots of recombinatioa) The cM distances 
are based on genetic markers mapped by G6n^on which provide boundaries for radiation hybrid 
markers whose sequojces were included in each of the clusters. 

XI. Microarray Analysis 

Probe Preparation from Tissue or Cell Samples 
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Total RNA is isolated from tissue samples using the guanidinium thiocyanate m^hod and 
polyA* RNA is purified using the oligo (dT) cdlulose m^od Each polyA* RNA saiiq)le is reverse 
transcribed using MMLV reverse-transcriptase, 0.05 pg/jil digo-dT primer (21mer), IX first strand 
buffer, 0.03 units/jil RNase inhibitor. 500 fiM dATP. 500 jiM dGTP, 500 \iM dTTP, 40 jiM dCTP, 
5 40 pM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription 
reaction is performed in a 25 ml volume containing 200 ng polyA* RNA with GEMBRIGHT kits 
(Incyte). Specific control polyA* RNAs are synthesized by in vitro transcription from non-coding yeast 
gOKHnic DNA (W. Ld, ui5)ublished). As quantitative controls, the control mRNAs at 0.002 ng, 0.02 
ng, 0.2 ng, and 2 ng are diluted into reverse transCTiption reaction at ratios of 1:100,000, 1:10,000. 

10 1:1000. 1:100 (w/w) to sample mRNA respectively. Hie contrQl mRNAs are diluted into reverse 
transcription reaction at ratios of 1:3, 3:1. 1:10. 10:1, 1:25, 25:1 (w/w) to sample mRNA differeaitial 
expression patterns. After incubation at 3r C fcH* 2 hr. eadi reaction saiiqjie (one with Cy3 and another 
with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and mcubated for 20 minutes at 
85° C to the slop the reaction and degrade the RNA Probes are purified using two successive 

15 CHROMA SPIN 30 gd filtration spin columns (CLONTECH Laboratories. Inc. (CLONTECH), Palo 
Alto CA) and after combining, both reaction samples are ^hanol precipitated using 1 ml of glycogen (1 
mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanoL The probe is then dried to completion 
using a SpeedVAC (Savant Distruments Inc., Holbrook NY) and resuspended in 14 ^1 5X SSaO.2% 
SDS. 

20 

Microarrav Preparation 

Sequences of the present invention are used to generate array dements. Each array element is 
amplified from bacterial cdls containing vectors with cloned cDNA inserts. PGR amplification uses 
primers complem^itary to the vector sequences flanking the cDNA insert. Array elements are 

2 5 anq)lified in thirty cycles of PGR from an initial quantity of 1 -2 ng to a final quantity greater than 5 fig. 

Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech). 

Purified array dements are immobilized on polymff-coated glass slides. Glass microscope 
slides (Coming) are cleaned by ultrasound in 0.1 % SDS and acetone, with extensive distilled wato- 
washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR Sdentific 

3 0 Products Corporation (VWR). West Chester. PA), washed extensivdy in distilled water, and coated 

with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 1 10°G oven. 

Array demaits are applied to the coated glass substrate using a procedure described in US 
Patent No. 5,807,522. incorporated herdn by reference. 1 ^1 of the array dement DNA, at an average 
concentration of 100 ng/^1. is loaded into the open capillary printing demait by a high-speed robotic 
3 5 apparatus. The apparatus then deposits about 5 nl of array dement sample per slide. 
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Miaoarrays are UV-aosslinked using a STRATALINKER UV-aosslinker (Stratagene). 
Microarrays are washed at room tonperature once in 0.2% SDS and three times in distilled water. 
Non-specific binding sites are blocked by incubation of niicroairays in 0.2% casdn in phosphate 
buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60°C followed by washes in 0.2% 
5 SDS and distilled water as before. 

Hybridization 

Hybridization reactions contain 9 ^l of probe mixture consisting of 0.2 pg each of Cy3 and 
Cy5 labeled cDNA synthesis products in 5X SSC» 0.2% SDS hybridization buffer. The probe mixture 

10 is heated to 65*'C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 
cm^ coverslip. The arrays are transferred to a waterproof chamber having a cavity just slightly larger 
than a miaoscc^ slide. The chamber is kept at 100% humidity internally by the addition of 140 ^il of 
5x SSC in a comer of the chambo*. The chamber containing the arrays is incubated for about 6.5 
hours at 60**C. The arrays are washed for 10 min at 45''C in a first wash buffer (IX SSC, 0.1% SDS), 

15 three times for 10 minutes each at 45''C in a second wash buffer (O.IX SSC), and dried. 

D^ection 

Rq)orter-labded hybridization coiiq)lexes are detected with a microscope equipped with an 
Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines 

20 at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is 
focused on the array using a 20X microscope objective (Nikon, Inc., Melville NY). The slide 
containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- 
scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a 
resolution of 20 micrometers. 

25 In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. 

Emitted li^t is split, based on wavelragth, into two photomultiplier tube detectors (PMT R1477, 
Hamamatsu Photonics Systons, Bridgewater NJ) corresponding to the two fluorophores. impropriate 
filters positioned b^ween the anay and the photomultiplier tubes are used to filter the signals. The 
emission maxima of the fluorophores used are 365 nm fa Cy3 and 650 nm for Cy5. Each array is 

3 0 typically scanned twice, one scan per fiuorophore using the apprqniate filters at the laser source, 
although the apparatus is capable of recording the spectra from both fluorophores simultaneously. 

The sensitivity of the scans is typically calibrated using the signal intensity generated by a 
cDNA control species added to the probe mix at a known concratration. A specific location on the 
. array contains a conq)lementary DNA sequence, allowing the intensity of the signal at that location to 

35 becorrelatedwithawelghtratioofhybridizingspeciesof 1:100,000. When two probes fi-om differoit 
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sources (e.g., representii^ test and control cdls), each labded with a different fluarqphare» are 
hybridized to a single array for the purpose of identifying genes that are differ^ally e3q)ressed, the 
calibration is done by labeling sanq)les of the calibrating cDNA with the two fluorophoies and adding 
identical amounts o^ each to the hybridization mixture. 

Hie output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital 
(A/D) conversion board (Analog Devices, Inc., Ncvwood, MA) installed in an IBM-compatible PC 
conq)uter. Tbt digitized dau are displayed as an image where the signal intensity is mailed using a 
linear 20-oolor transformation to a pseudocolor scale ranging from blue (low signal) to red (high 
signal). The data is also analyzed quantitatively. Where two differratfiuorophores are excited and 
measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission 
spectra) between the fluorophores using each fluorophore's emission spectrum. 

A grid is superinqmed over the fluorescence signal image such that the signal from each spot 
is c^t^ed in each dement of the grid. The fluorescence signal within each dement is th^ integrated to 
obtain a numerical value corresponding to the average intensity of the signal. The software used for 
signal analysis is the GEMTOOLS gene expression analysis program (Incyte). 

XII. Complementary Nudeic Adds 

Sequaices conq)lementary to the ditlq) are used to detect^ decrease, or inhibit expression of the 
naturally occurring nucleotide. The use of oligonucleotides con^>rising from about 15 to 30 base pairs 
is typical in the art. However, smaller or larger sequence fragments can also be used Appropriate 
oligonucleotides are designed from the dithp using OLIGO 4.06 software (National Biosciences) or 
othCT appropriate programs and are synthesized using methods standard in the art or or6eted from a 
commerdal supplier. To inhibit transcription, a complonentary oligonucleotide is designed from the 
most unique 5* sequence and used to prevent transcription facta* binding to the promoter sequaica To 
inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal binding and 
processing of the transcript. 

XIIL Expression of DITHP 

Expression and purification of DITHP is acconplished using bacterial or virus-based 
expression systems. For expression of DITHP in bacteria, cDNA is subcloned into an appropriate 
vector containing an antibiotic resistance gene and an indudble promoter that directs high levels of 
cDN A transcription Examples of such promoters include^ but are not limited to, the trp4ac (tac) 
hybrid pronwter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator 
regulatory dement Recombinant vectcffs are transformed into suitable bacterial hosts, e.g., 
BL21(D£3). Antibiotic resistant bacteria express DITHP upon induction with isq}ropyl beta-D- 



157 



wo 00/73509 



PCT/USOO/15404 



thiogalactppyranoside (IPTG). Expression of DITHP in oikaryotic cdls is achieved by infecting insect 
or mammalian cell lines with recombinant Autocraphica califomica nuclear polyfaedrosis virus 
(AcMNPV). commonly known as baculovirus. The noness^al polyhedrin gene of baculovirus is 
rq)laced with cDNA encoding DITHP by either honulogous recombination or bacterial-mediated 
5 transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong 
polyhedrin promoter drives higji levels of cDNA transcriptioa Recombinant baculovirus is used to 
infect SpodoDtera frugperda (Sf9) insect cells in most cases, or human hq)atocytes, in some cases. 
Infection of the latter requires additional genetic modifications to baculovirus. (See ag., Engdhard, 
supra : and Sandig, supra .) 

10 In most expression systems, DITHP is synthesized as a fusion protein with, e.g., glutathione S- 

transferase (GST) or a peptide qiitope tag, such as FLAG or 6-His, permitting rapid, single-stq), 
affinity-based purification of recombinant fusion protdn from oudc c^ lysates. GST, a 26-kilodalton 
oizyme from Schistosoma iaponicum . ^bles the purification of fiision proteins on immobilized 
glutathione under conditions that maintain protdn activity and antigenidty (Amersham Pharmacia 

15 Biotech), Following purification, the GST moirty can be proteolytically cleaved from DITHP at 
specifically engineered sites. FLAG, an S-amino acid pq^tide, enables immunoaffinity purification 
using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman Kodak 
Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables purification on 
metal-chdate resins (QIAGEN). Methods for protdn ^pression and purification are discussed in 

20 Ausubel (1995. supra . Chapters 10 and 16). Purified DITHP obtained by these mdhods can be used 
directly in the foUowing activity assay. 

XIV. Demonstration of DITHP Activity 

DITHP activity is demonstrated through a variety of specific assays, some of which are 
25 outlined below. 

Oxidoreductase activity of DITHP is measured by the inaease in extinction coefficient of 
NAD(P)H coenzyme at 340 ran for the measurOTient of oxidation activity, or the decrease in extinction 
coefficient of NAD(P)H coenzyme at 340 ran for the measurement of reduction activity (Dalziel K. 
(1963) J. Biol. Chem. 238:2850-2858). One of tiiree substrates may be used: Asn-pGal. biocytidine. or 

30 ubiquinone- 10. The respective subunits of the enzyme reaction, for example, cytochlome c,-b 
oxidoreductase and cytochrome c, are reconstituted. The reaction mixture contains a) 1 -2 mg^ral 
DITHP; and b) 15 mM substrate, 2.4 mM NAD(P)* in 0.1 M phosphate buffer, pH 7.1 (oxidation 
reaction), or 2.0 mM NAD(P)H, in 0.1 M Na2HP04 buffer, pH 7.4 ( reduction reaction); in a total 
volume of 0. 1 ml. Changes in absorbance at 340 nm (A340) are measured at 23.5 " C using a recording 

3 5 spectrophotomrter (Shimadzu Scientific Instruments, Inc., Pleasanton C A). The amount of NAD(P)H 
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is stoicbiome^cally equivaloit to the amount of substrate initially present, and the diange in A340 is a 
direct measure of the amount of NAD(P)H i^oduced; AA340 = 6620[NADH]. Oxidoreductase activity 
of DITHP activity is proportional to the amount of NAD(P)H present in the assay. 

Transferase activity of DITHP is measured through assays sach as a methyl transferase assay 
in which the transfer of radiolabeled methyl groups between a donor substrate and an acceptor 
substrate is measured (Bokar, J. A. et al. (1994) J. Biol. Chem. 269:17697-17704). Reaction mixtures 
(50 Ml final volume) contain 15 mM HEPES, pH 7.9. 1.5 mM MgCi 2» 10 mM dithiothreitol, 3% 
polyvinylalcohol. 1.5 \xCi [me//iy/-^H]AdoMet (0.375 pM AdoMet) (DuPont-NEN), 0.6 [ig DITHP, 
and accq>tor substrate (0.4 pg [^^SJRNA or 6-mercaptopurine (6-MP) to 1 mM final concentration). 
Reaction mixtures are incubated at 30'*C for 30 minutes, then 65 for 5 minutes. The products are 
separated by chromatogr£q>hy or electrophoresis and the level of methyl transferase activity is 
determined by quantification of methyl-^H recovery. 

DITHP hydrolase activity is measured by the hydrolysis of appropriate synthetic pq>tide 
substrates conjugated with various chromog^c molecules in which the degree of hydrolysis is 
quantified by spectrophotometric (or flucffomctric) absorption of the released chromophore. (Beynon, 
R.J. and J.S. Bond (1994) Proteolvtic Enzvmes: A Practical Approach. Oxford University Press, New 
York NY, pp. 25-55) Pq)tide substrates are designed according to the category of protease activity as 
endopq^tidase (serine, cysteine, aspartic proteases), animopq)tidase Oeudne aminopq)tidase), or 
carboxypeptidase (Carboxypq}tidase A and B, procollagen C-proteinase). 

DITHP isomerase activity such as peptidyl prolyl ds/trans isomerase activity can be assayed 
by an enzyme assay described by Rahfeld»J.U.,etal. (1994) (FEBS Lett 352: 180-184). The assay 
is performed at lOT in 35 mM HEPES buffer, pH 7.8, containing chymotrypsin (0.5 rog/ml) and 
DITHP at a variety of concentrations. Under these assay conditions, the substrate, Suc-Ala-Xaa-Pro- 
Phe-4-NA, is in equilibrium with respect to the prolyl bond, with 80-95% in trans and 5-20% in cis 
conformation. An aliquot (2 ul) of the substrate dissolved in dimethyl sulfoxide (10 mg/ml) is added 
to the reaction mixture described above. Only the cis isomer of the substrate is a substrate for 
cleavage by diymotiypsin. Thus, as the substrate is isomerized by DITHP, the product is cleaved by 
diymotiypsin to produce 4-nitroanilide, which is detected by it's absorbance at 390 nm. 4- 
Nitroanilide appears in a time-dq)endent and a DITHP concentration-dq)endent manner. 

An assay for DITHP activity associated with growth and devdopment measures cell 
proliferation as the amount of newly initiated DNA synthesis in Swiss mouse 3T3 cells. A plasmid 
containing polynucleotides encoding DITHP is transfected into quiescent 3T3 cultured cdls using 
methods wdl known in the art The transiently transfected cdls are then incubated in the presence of 
pHJthymidine, a radioactive DNA precursor. Where applicable, varying amounts of DITHP ligand are 
added to Uie transfected cdls. Incorporation of [^HJthymidine into acid-predpitable DNA is measured 
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over an appropriate time interval, and the amount incorporated is directly prqxsrtional to tbe amount of 
newly synthesized DNA. 

Growth factor activity of DITHP is measured by the stimulation of DNA synthesis in Swiss 
mouse 3T3 cells (McKay. 1. and L Leigh, eds. (1993) Growth Factors: A Practical Approach. Oxford 
5 UnivCTsity Press, New York NY), hiitiation of DNA synthesis hidicates the cells' entry into the mitotic 
cycle and thdr commitment to undergo later division. 3T3 cdls are competait to respond to most 
growth factors, not only those that are mitogenic. but also those that are involved in embryonic 
induction. This competence is possible because the in vivo specificity demonstrated by some growth 
factors is not necessarily inherent but is determined by the responding tissue. In this assay, varying 
10 amounts of DITHP are added to quiescent 3T3 cultured cdls in the presence of [^HJthymidine, a 
radioactive DNA precursor. DITHP fcM* this assay can be obtained by recombinant means or from 
biochemical preparations. Incorporation of [^H]thymidine into acid-precipitable DNA is measured over 
an appropriate time interval, and the amount incorporated is directly proportional to the amount of 
newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP 
15 concentration range is indicative of growth factcff activity. One unit of activity per milliliter is defined 
as the conc^tration of DITHP producing a 50% response level, where 100% rq)resents maximal 
incorpOTation of ['Hjthymidine into add-precipitable DNA. 

Alternatively, an assay for cytokine activity of DITHP measures the proliferation of 
leukocytes. In this assay, the amount of tritiated thymidine incorporated into newly synthesized DNA 
is used to estimate proliferative activity. Varying amounts of DITHP are added to cultured 
leukocytes, such as granulocytes, monocytes, or lymphocytes, in the presence of ['HJthymidine, a 
radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or ftom 
biodiemical preparations. Incorporation of f H]thymidine into acid-precipitable DNA is measured 
over an appropriate time interval, and the amount incorporated is directly proportional to the amount 
of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP 
concentration range is indicative of DITHP activity. One unit of activity per milliliter is 
conventionally defined as the concentration of DITHP producing a 50% response level, where 100% 
represents maximal incorporation of [^H]thymidine into add-precipitable DNA. 

An alternative assay for DITHP cytokine activity utilizes a Boyden micro chamber 
(Neuroprobe. Cabin John MD) to measure leukocyte chemotaxis (Vicari, supra) . In this assay, about 
Itf migratory cells such as macrophages or monocytes are placed in cell culture media in the upper 
compartment of the chamber. Varying dilutions of DITHP are placed in the lower compartment TTie 
two con^artments are sq)arated by a 5 or 8 micron pore polycarbonate filler (Nucleopore, Pleasanton 
CA). AftCT incubation at 37**C for 80 to 120 minutes, the filters are fixed in methanol and stained 
with appropriate labeling agents. Cells which migrate to the other side of the filter are counted using 
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Standard microscopy. Hie chemotactic index is calculated by dividing the number of migratcny cells 
counted when DITHP is present in the lower compartment by the number of migratory cells counted 
whra only media is present in the lower compartment The chemotactic ind^ is proportional to the 
activity of DITHP. 

5 Altemativdy, cdl lines or tissues transformed with a vector containii^ dithp can be assayed for 

DITHP activity by immunoblotting. Cells are denatured in SDS in the presence of p-macaptoethanol, 
nuddc adds removed by ethanol predpitation* and protons purified by acetone predpitatioa Pdlets 
are resuspended in 20 mM tris bu£fiEr at pH 7.S and incubated with Protein G-Sepharose pre-coated 
with an antibody specific for DITHP. Alter washing, the SqAarose beads are boiled in dedrophoresis 

10 sample buffer, and the duted protdns subjected to SDS-PAGE. The SDS-PAGE is transferred to a 
nitrocellulose membrane fOT immunoblotting, and the DITHP activity is assessed by visualizmg and 
quantifying bands on the blot using the antibody specific for DITHP as the primary anti'oody and *^1- 
labded IgG specific for the primary antitxxiy as the secondary antibody. 

DITHP kinase activity is measured by phosphorylation of a protein substrate using y-labeled 

15 [^^P]-ATP and quantitation of the incorporated radioactivity using a radioisotope counter. DITHP is 
incubated with the protein substrate, l"P]-ATP» and an ^propriate kinase buffer. The [^^P] 
incorporated into the product is sq)arated from fi-ee [^^PJ-ATF by electrophoresis and the incorporated 
[^^P] is counted. Hie amount of [^^P] recovered is proportional to the kinase activity of DITHP in 
the assay, A determination of the specific amino acid residue phosphorylated is made by 

2 0 phosphoamino acid analysis of the hydrolyzed protein. 

In the alternative, DITHP activity is measured by the increase in cell prolifo-ation resulting 
from transformation of a mammalian cdl line such as C0S7, HeLa or CHO with an eukaryotic 
expression vector encoding DITHP. Eukaryotic expression vectors are commerdally available, and 
the tediniques to introduce them into cells are well known to those skilled in the art The cells are 

2 5 incubated for 48-72 hours after transformation und^ conditions appropriate for the cell line to allow 

expression of DITHP. Phase microscopy is then used to compare the mitotic index of transformed 
versus control cdls. An increase in the mitotic index indicates DITHP activity. 

In a further alternative, an assay for DITHP signaling activity is based upon the ability of 
GPCR family proteins to modulate G protein-activated second messenger signal transduction 
30 pathways (e.g., cAMP; Gaudin, P. et al. (1998) J. Biol. Chem. 273:4990-4996). A plasmid encoding 
fiill length DITHP is transfected into a mammalian cell line (e.g., Chinese hamster ovary (CHO) or 
human embryonic kidney (HEK-293) cdl lines) using methods wdl-known in the art. Transfected 
cells are grown in 12-well trays in culture medium for 48 hours, then the culture medium is discarded, 
and the attached cdls are gently washed with PBS. The cdls are then incubated in culture medium 

3 5 with or without ligand for 30 minutes, then the medium is removed and cells lysed by treatment with 
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1 M perchloric acid. Hie cAMP levels in the lysate are measured by radioimmunoassay using 
methods well-known in the art Changes in the levels of cAMP in the lysate from cells exposed to 
ligand compared to those without ligand are proportional to the amount of DITHP present in the 
transfected cells. 

s Alternatively* an assay for DITHP protein phosphatase activity measures the hydrolysis of P- 

nitrqpbenyl phosphate (PNPP). DITHP is incubated together with PNFP in HEPES buffer pH 7.5. in 
Uie pres^ce of 0. 1 % p-mercaptoetiianc3] at 37 ""C for 60 mia The reaction is stopped by the addition of 
6 ml of 10 N NaOH, and the increase in light absorbance of the reaction mixture at 410 nm resulting 
from tlie hydrolysis of PNPP is measured using a spectrophotometer. The inaease in light absorbance 
10 is proportional to the phosphatase activity of DITHP in Uie assay (Diamond, R.H. et al (1994) Mol Cdl 
Biol 14:3752-3762). 

An alternative assay measure*; DITHP-mediated G-protein signaling activity by monitorijig 
the mobilization of Ca*^ as an indicator of the signal transduction pathway stimulation. (See, e.g., 
Grynkievicz. G. et al. (1985) J. Biol. Chem. 260:3440; McColl, S. et al. (1993) J. Immunol. 

15 150:4550-4555; and Aussel. C. et al. (1988) J. Immunol. 140:215-220). The assay requires 
preloading neutrophils or T cells with a fluorescent dye such as FURA-2 or BCECF (Universal 
Imaging Corp, Westchester PA) whose emission characteristics are altered by Ca** binding. When 
the cells are exposed to one or more activating stimuli artificially (e.g., anti-CD3 antibody ligation of 
the T cell receptor) or physiologically (e.g., by allogeneic stimulation), Ca** flux takes place. This 

20 flux can be observed and quantified by assaying the cells in a fluorometer or fluorescent activated cell 
softa-. Measurements of Ca^ flux are compared between cells in their normal state and those 
transfected witii DITHP. Increased Ca^ mobilization attributable to increased DITHP conc^tration 
is proportional to DITHP activity. 

DITHP transport activity is assayed by measuring uptake of labeled substrates into Xenoous 

25 laeWs oocytes. Oocytes at stages V and VI are injected with DITHP mRNA (10 ng per oocyte) and 
incubated for 3 days at 18T in 0R2 medium (82.5niM NaCl, 2.5 mM KG, ImM CaClj, ImM MgGj, 
ImM Na2HP04, 5 mM Hepes, 3.8 mM NaOH, 50^g/ml gentamydn, pH 7.8) to allow expression of 
DITHP protein. Oocytes are then transferred to standard uptake medium (lOOmM NaCl. 2 mM KCl, 
ImM CaCli, ImM MgClj, 10 mM Hq)es/Tns pH 7.5). Uptake of various substrates (e.g., amino 

30 adds, sugars, drugs, ions, and neurotransmitters) is initiated by adding labded sut>strate (e.g. 

radiolabded with ^H, fluoresc^y labded witii rhodamine, ac.) to the oocytes. After incubating for 30 
minutes, uptake is terminated by washing tiie oocytes three times in Na^-free medium, measuring the 
incorporated label, and comparing with controls. DITHP transport activity is proportional to tiie levd 
of internalized labded substrate. 

3 5 DITHP transferase activity is demonstrated by a test for galactosyltransf erase activity. Hiis 
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can be detennined by measuring the transfer of radiolabeled galactose from UDP-galactose to a 
GlcNAc-terminated oligosacdiaiide chain (Kolbinger» F. et al. (1998) J. Biol. Chsm. 273:58-65). The 
sanq)le is incubated with 14 ^1 of assay stock solution (1 80 mM sodium cacodylate, pH 6.5, 1 mg/ml 
bovine serum albumin, 0.26 mM UDP-galactose, 2 \x\ of UDP-PH]galactose). 1 |il of MnCl, (500 
5 mM)t and 2.5 \i\ of GlcNAcpO-(CH2),-C()3Me (37 mg/nd in dimethyl sulfoxide) for 60 minutes at 
SVC. The reaction is quenched by the addition of 1 mlof water andloadedonaClS SqvPak 
cartridge (Waters), and the column is washed twice with 5 ml of water to remove unreacted UDP- 
PH]galactose. The PHlgalactosylated GlcNAcpO(CH2).-C02Me remains bound to the column during 
the water washes and is duted with 5 ml of methanol. Radioactivity in the ehited material is measured 
10 by liquid sdntillation counting and is proportional to galactosyltransferase activity in the stardng 
sanq)le. 

In the alternative-, DHH? induction by hest or toxins may b&d^ionstrdied using primary 
cultures of human fibroblasts or human cell lines such as CCL-13, HEK293, or HEP 02 (ATCC). To 
heat induce DITHP expression, aliquots of cells are incubated at 42 "^C for 15, 30, or 60 minutes. 

15 Contt-ol aliquots are incubated at 37 **C for the same time periods. To induce DITHP expression by 
toxins, aliquots of cells are treated with 100 \M arsenite or 20 mM azetidine-2-carboxylic acid for 0, 
3, 6, or 12 hours. After exposure to heat, arsenite, or the amino acid analogue, samples of the treated 
cells are harvested and cell lysates prq)ared for analysis by west^ blot. Cells are lysed in lysis 
buffer containing 1% Nonidet P-40, 0.15 M NaCl, 50 mM Tris-HQ. 5 mM EDTA, 2 mM 

20 N-ethylmaleiraide, 2 mM phenylmethylsulfonyl fluoride, 1 mg/ml leupeptin, and 1 mg/ml pepstatin. 
Twenty micrograms of the cell lysate is separated on an 8% SDS-PAGE gel and transferred to a 
membrane. After blocking with 5% nonfat dry milk/phosphate-buffered saline for 1 h, the membrane 
is incubated overnight at 4°C or at room temperature for 2-4 hours with a 1 : 1 000 dilution of 
anti-DITHP serum in 2% nonfat dry milk/phosphate-buffered saline. The membrane is then washed 

25 and incubated with a 1:1000 dilution of horseradish p^oxidase-conjugated goat anti-rabbit IgG in 2% 
dry milk/phosphate-buffered saline. After washing with 0.1% Twe^ 20 in phosphate-buff»-ed saline, 
the DITHP protein is detected and compared to controls using chemiluminescence. 

Altonatively, DITHP protease acdvity is measured by the hydrolysis of appn>priate synthetic 
peptide substrates conjugated with various chromogenic molecules in which the degree of hydrolysis 

30 is quantified by spectrophotomeuic (or fluorometric) absoiption of the released chromophore 

(Beynon, R,J. and J.S. Bond (1994) Proteolytic Enzvmes: A Practical Approach. Oxford Univo^ity 
Rress, New York, NY, pp.25-55). Peptide substrates are designed according to the category of 
protease activity as endopeptidase (smne, cysteine, aspartic proteases, or metalloproteases), 
aminopq)tidase (leucine aminopq)tidase), or carboxypq)tidase (carboxypeptidases A and B, 

3 5 procollagen C-proteinase). Commonly used chromogens are 2-naphthylamine, 4-nitroaniline, and 
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fiuylacrylic acid. Assays are performed at ambient t^perature and contain an aliquot of the csayme 
and the appropriate substrate in a suitable buffer. Reactions are carried out in an optical cuvette, and 
the increase/decrease in absorbance of the chromogen rdeased during hydrolysis of the peptide 
substrate is measured. The change in absorbance is proportional to the DITHP protease activity in the 
assay. 

In the alternative, an assay for DITHP protease activity takes advantage of fluorescence 
resonance energy transfo- (FRET) that occurs when one donor and one acceptor fluorophore with an 
^ropriate spectral overlap are in close proximity. A flexible peptide linker containing a cleavage 
site specific for PRTS is fused between a red-shifted variant (RSGFP4) and a blue variant (BFP5) of 
Green Fluorescent Protein. This ftision protein has spectral properties that suggest energy transfer is 
occurring from BFP5 to RSGFP4. When the fusion protein is incubated with DITHP, the substrate is 
cleaved, and the two fluorescent proteins dissociate. This is accompanied by a marked decrease in 
energy transfer which is quantified by comparing the emission spectra before and after the addition of 
DITHP (Mitra, R.D. et al (1996) Gene 173: 13-17). This assay can also be performed in living cells. 
In this case the fluorescent substrate protein is expressed constitutively in cells and DITHP is 
introduced on an inducible vector so that FRET can be monitored in the presence and absence of 
DITHP (Sagot, I. et al (1999) FEES Lett. 447:53-57). 

A method to determine the nucleic add binding activity of DITHP involves a polyacrylamide 
gel mobility-shift assay. In prq)aration for this assay, DITHP is expressed by transfmning a 
mammalian cell line such as C0S7, HeLa or CHO with a eukaryotic expression vector containing 
DITHP cDNA. The cdls are incubated for 48-72 hours after transformation under conditions 
appropriate for the cdl line to allow expression and accumulation of DITHP. Extracts containing 
solubilized proteins can be prq)ared firom cdls expressing DITHP by methods wdl known in the art. 
Portions of the extract containing DITHP are added to [^^]-labded RNA or DNA. Radioactive nucldc 
add can be synthesized in vitro by techniques well known in the art The mixtures are incubated at 
25 in the presence of RNase- and DNase-inhibitors under buffered conditions for 5- 1 0 minutes. 
After incubation, the samples are analyzed by polyacrylamide gd dectrpphoresis followed by 
autoradiography. The presence of a band on the autoradiogram indicates the formation of a comply 
between DITHP and the radioactive transcript. A band of similar mobility will not be present in 
samples prq}ared using control extracts prepared fi-om untransformed cdls. 

In the alternative, a method to determine the methylase activity of a DITHP measures transfer 
of radiolabded metiiyl groups between a donor substrate and an acceptor substrate. Reaction mixtures 
(50 ^1 final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM dithiothrdtol. 3% 
polyvinylalcohol, 1.5 \xCi [OTe//iy/-^H]AdoMet (0.375 ^iM AdoMel) (DuPont-NEN), 0.6 |ig DITHP, 
and accq)tQr substrate (e.g., 0.4 fig [^^S]RNA, or 6-mercaptppurine (6-MP) to 1 mM final 
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concoitration). Reaction mixtures are incubated at 30T for 30 minutes, then 65 for 5 minutes. 
Analysis of Ime%/-^H]RNA is as follows: 1) 50 ^1 of 2 x loading buffer (20 mM TOs-HCl. pH 7.6. 1 
M LiQ. 1 mM EDTA, 1 % sodium dodecyl sulphate (SDS)) and 50 pi oligo d(T)-celhilose (10 mgAnl 
in 1 X loading buffer) are added to the reaction mixnire, and incubated at ambiem temper 
5 shaking for 30 minutes. 2) Reaction mixtures are transfored to a 96-well filtration plate attached to a 
vacuum apparatus. 3) E^ch sample is washed sequentially with tiiree 2.4 ml aliquots of 1 x oligo d(T) 
loading buffer containing 0.5% SDS, 0. 1 % SDS. or no SDS. and 4) RNA is duted with 300 \i\ of 
water into a 96-weiIl collection plate^ transferred to scintillation vials containing liquid sdntillant, and 
radioactivity determined. Analysis of [mer/iy/-^H]6-MP is as follows: 1) 500 pi 0.5 M borate buffer, 

10 pH 10.0, and then 2.5 ml of 20% (v/v) isoamyl alcohol in toluene are added to the reaction mixtures. 2) 
Hie sauries mixed by vigorous vortexing for ten seconds. 3) After centrifugation at 70Qg for 10 
minutes, 1.5 ml of the organic phase is transferred to scintillation vials coiUaining 0.5 mi absolute 
etiumol and liquid sdntillant, and radioactivity ddermined. and 4) Results are corrected for tiie 
extraction of 6-MP into tiie organic phase (approximately 41 %). 

1 5 An.assay for adhesion activity of DITHP measures tiie disruption of cytoskdetal filam^t 

netwOTks upon overexpression of DITHP in cultured cell lines (Rezniczek, G.A. et al. (1998) J. Cdl 
Biol. 141 :209-225). cDNA encoding DITHP is subcloned into a mammalian expression vector that 
drives high levds of cDNA expression. This constnirt is transfected into cultured cdls, such as rat 
kangaroo PtK2 rat bladdo* carcinoma 804G cells, Actin filaments and intermediate filaments such 

20 as keratin and vimentin are visualized by immunofluorescence miaoscopy using antibodies and 
techniques wdl known in tiie art. The configuration and abundance of cytoskdetal filaments can be 
assessed and quantified using confocal imaging techniques. In particular, the bundling and collapse of 
cytoskdetal filament n^works is indicative of DITHP adhesion activity. 

Altemativdy, an assay fOT DITHP activity measures Uie expression of DITHP on tiie cell 

2 5 surface. cDNA encoding DITHP is transfected into a non-leukocytic cell line. Cdl surface protdns are 

labeled witii biotin (de la Fuente, M.A. et al. (1997) Blood 90:2398-2405). Immunoprecipilations are 
perfmned using DITHP-specific antibodies, and immunq)redpitated san^)les are analyzed using SDS- 
PAGE and immunoblotting techniques. The ratio of labeled immunoprecipitant to unlabded 
immunopredpitant is proportional to tiie amount of DITHP expressed on tiie cell surface. 
30 Alternativdy, an assay for DITHP activity measures tiie amount of cdl aggregation induced by 

overexpression of DITHP. In tiiis assay, cultured cdls such as Nffl3T3 are transfected witii cDNA 
encoding DITHP contained witiiin a suitable mammalian expression vector under control of a strong 
promoter. Cotransfection with cDNA encoding a fluorescent marker protdn, such as Green 
Huorescent Ptotdn (CLONTECH), is useful for identifying stable transfectants. The amount of cell 

3 5 agglutination, or clumping, assodated witii transfected cdls is con^ared witii tiiat assodated with 
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untransfectedcdls. The amount of cdlaggjutiimtion is a direa measure of DIT^ 

DITHP may recogjuze and precipitate antigen from serum. This activity can be measured by 
the quantitative precipitin reaction (Golub. ES. et al. (1987) Immunology: A Synthesis . Sinaua 
Associates, Sunderland MA, pages 113-115). DITHP is isotqpically labeled using methods known in 
theart Vanous serum concentrations are added to constam amounts of labded DITHP. DITHP- 
antigen complexes precipitate out of solution and are collected by centriiiigatioa The amount of 
predpitable DITHP-antig^ c(Hq>lex is proportional to the amount of radidsotqpe d^ected in the 
precipitate. The amount of predpitable DITHP-antigra oonq)lex is plotted against the serum 
concentration. For various serum concentrations, a characteristic precipitation curve is obtained, in 
whidi the amount of predpitable DITHP-antigen complex initially increases prqxxrtionately with 
increasing serum conc^ation, peaks at the equivalence point, and then decreases prqxHtionately with 
further increases in sprnm concentration. Thus, the amount of predpitable DITKP-antigen con5)iex is 
a measure of DITHP activity which is characterized by sensitivity to both limiting and excess quantities 
of antigen. 

A microtubule motility assay for DITHP measures motor protein activity. In this assay, 
recombinant DITHP is immobilized onto a glass slide or similar substrate. Taxol-stabilized bovine 
brain microtubules (commercially available) in a solution containing ATP and cylosolic extraa are 
perfused onto the slide. Movement of microtubules as driven by DITHP motor activity can be 
visualized and quantified using video-enhanced light microscopy and image analysis techniques, 
DITHP motor protein activity is directly proportional to the frequency and velocity of microtubule 
movement 

Altemativdy, an assay for DITHP measures the formation of protein filaments in vitro . A 
solution of DITHP at a concentration greats than the "critical concentration" for polymer assembly is 
implied to carbon-coated grids. Appropriate nucleation sites may be supplied in the solution. The 
grids are negative stained with 0.7% (w/v) aqueous uranyl acetate and examined by electron 
microscopy. The appearance of filaments of approximately 25 nm (microtubules), 8 nm (actin), or 10 
nm (intermediate filaments) is a demonstration of protein activity. 

DITHP electron transfer activity is demonstrated by oxidation or reduction of N ADP. 
Substrates sudi as Asn-pCal, biocytidine, (x ubiquinone-lO may be used. The reaction mixture 
contains 1-2 mg/ml HORP, 15 mM substrate, and 2.4 mM NAD(P)* in 0. 1 M phosphate buffer, pH 
7.1 (oxidation reaction), or 2.0 mM NAD(P)H. in 0.1 M Na2HP04 buffer. pH 7.4 (reduction reaction); 
in a total volume of 0. 1 ml. FAD may be included with NAD, according to methods well known in the 
an. Changes in absorbance are measured using a recording spectrophotometer. The amount of 
NAD(P)H is stoichiometrically equivalent to the amount of substrate initially present, and the change in 
A340 is a direct measure of the amount of NAD(P)H produced; AA340 = 6620[NADH]. DITHP activity 
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is proportiona] to the amount of NAD(P)H present in the assay. The increase in extinction coefifici^t 
of NAD(P)H coenzyme at 340 nm is a measure of oxidation activity* or the decrease in extinction 
coefficient of NAD(P)H coenzyme at 340 nm is a measure of reduction activity (Dalzid, K. (1963) J. 
Biol. Chan. 238:2850-2858). 
5 DITHP transcription factor activity is measured by its ability to stimulate transoiption of a 

reporter gene (Liu, H.Y. a al. (1997) EMBO J. 16:5289-5298). Theassay entails the use of a well 
characterized rq)orter gen& onistnict, LexAop-LacZ, that consists of LexA DNA transcriptional control 
danents (LexA^) fused to sequences encoding the E. coli LacZ enzvme. Tlie methods for constructing 
and expressing fusion genes, introducing them into cdls, and measuring LacZ enzyme activity, are well 

1 0 known to those skilled in the art Sequoices encoding DITHP are cloned into a plasmid that directs the 
synthesis of a fusion protdn, L^A-DITHP, consisting of DITHP and a DNA binding domain derived 
uom the LexA transcription factor. The resulting plasmid, encoding a LexA-DITHP fusion protdn, is 
introduced into yeast cdls along with a plasmid containing the LexA^-LacZ rqKuter gene. The amount 
of LacZ enzyme activity associated with L^A-DITHP transfected ceils, rdative to control cdls, is 

15 proportional to the amount of transcription stimulated by the DITHP. 

Chromatin activity of DITHP is demonstrated by measuring sensitivity to DNase I (Dawson, 
B.A. et al. (1989) J. Biol. Chem. 264:12830-12837). Smples are treated with DNase I, followed by 
insertion of a deavable biotinylated nucleotide analog, 5-[(N-biotinamido)hexanoamido-ethyl-l,3- 
thiopropionyl-3-aminoallyl]-2'-deoxyuridine 5 -triphosphate using nick-rq)air techniques well known to 

2 0 those skilled in the art. Following purification and digestion with EcoRI restriction endonuclease, 

biotinylated sequences are afQnity isolated by sequential binding to streptavidin and biotincdlulose. 

Another specific assay demonstrates the ion conductance capacity of DITHP using an 
electrophysiological assay. DITHP is expressed by transforming a mammalian cell line such as 
C0S7, HeLa or CHO with a eukaryotic expression vector encoding DITHP. Eukaryolic expression 

25 vectors are commercially available, and the techniques to introduce them into cells are well known to 
those skiUed in the art. A small amount of a second plasmid, which expresses any one of a number of 
marker genes such as ^-galactosidase, is co-transformed into the cells in order to allow rapid 
identification of those cells which have taken up and expressed the foreign DNA. The cells are 
incubated for 48-72 hours after transformation undo* conditions appropriate for the cell line to allow 

30 expression and accumulation of DITHP and P-galactosidase. Transformed cells expressing P- 
galactosidase are stained blue when a suitable colorimetric substrate is added to the culture media 
under conditions that are well known in the art Stained cells are tested for differences in membrane 
conductance due to various ions by electrophysiological techniques that are well known in the art. 
Untransformed cells, and/or cells transformed with either vector sequences alone or p-galactosidase 

3 s sequences alone, are used as controls and tested in parallel. The contribution of DITHP to cation or 
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anion conductance can be shown by incubating the cells using antitxxlies specific for either DITHP. 
Ihe respective antibodies will bind to the extracellular side of DITHP, thereby blocking tiie pore in 
the ion channd, and the associated conductance. 

5 XV. Functional Assays 

DITHP Amction is assessed by expressing dithp at physiologically elevated levds in 
mammalian cdl culture systems. cDNA is subcloned into a mammalian expression vector containing a 
strong promoter that drives high levds of cDN A expression. Vectors of choice include pCMV SPORT 
(Life Technologies) and pCR3. 1 (Invitrogen Corporation, Carlsbad CA), both of whidi contain the 
10 cytomegalovirus promoter. S-10 ^g of recombinant veaor are transientiy transfected into a human cdl 
line, preferably of endothelial or honatqpoietic origin^ using dther liposome fcnmulations or 
dectrc^ation. 1 -2 jig of an additional plasmid coiitaimng sequences encoding a marker protdn are 
co-transfected. 

Expression of a marker protdn provides a means to distinguish transfected cdls from 
1 5 nontransfected cdls and is a reliable predictor of cDNA expression from the recombinant veaor. 
Marker protdns of dioice include, e.g.. Green Fluo-escent Protdn (GFP; CLONTECH), CD64, or a 
CD64-GFP fusion protein. Flow cytometry (FCM), an automated laser optics-based technique, is used 
to identify transfected cdls expressing GFP or CD64-GFP and to evaluate the apq)totic state of the 
cdls and other cellular prc^)erties. 
20 FCM d^ects and quantifies the uptake of fluorescent molecules that diagnose events preceding 

or coincident with cdl death. These events include changes in nuclear DNA content as measured by 
staining of DNA with propidium iodide; changes in cdl size and granularity as measured by forward 
Ught scatter and 90 degree side light scatter; down-regulation of DNA synthesis as measured by 
decrease in bromodeoxyuridine uptake; alterations in expression of cdl surface and intracdlular 
25 proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane 
con5)osition as measured by the binding of fluorescdn-conjugated Annexin V protdn to the cdl 
surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow Cvtometrv . Oxford, 
New York NY. 

The influence of DITHP on gene expression can be assessed using highly purified populations 
30 of cdls transfected with sequences encoding DITHP and dtiier CD64 or CD64-GFP. CD64 and 
CD64-GFP are expressed on the surface of transfected cdls and bind to conserved regions of human 
immunoglobulin G (JgG), Transfected cells are efficientiy sq)arated from nontransfected cdls using 
magnetic beads coated with dther human IgG or antibody against CD64 (DYNAL, Inc., Lake Success 
NY). mRNA can be purified fi^om the cells using metiiods wdl known by those of skill in Uie art. 
3 5 Expression of mRNA encoding DITHP and other genes of interest can be analyzed by nortiiem analysis 
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or microaiTay techniques. 

XVL Production of Antibodies 

DITHP substantially purified using polyacrylamide gd dectropboresis (PAGE; see, &g., 
5 Harrington, M.G. (1990) Methods Enzymol. 182:488-495). or other purification techniques, is used to 
inununize rabbits and to produce antibodies using standard protocols. 

Altemativdy, the DITHP amino add sequence is analyzed using LASERGENE software 
(DN ASTAR) to determine regions of high immunogenidty, and a corresponding pq)tide is synthesized 
and used to raise antibodies by means Imown to those of skill in the art M^hods f or sdection of 
10 appropriate qsitopes, sudi as those near the C-tenninus or in hydrq)hilic regions are wdl described in 
the art (See, e.g.. Ausubd, 1995, sutya. Chspta II.) 

Typically, peptide 15 residues in length are synthesized using an ABi 431 A pq)tide 
synthesizer (PE Biosystems) using finoc-chemistry and coupled to KLH (Sigma) by reaction with N- 
maldmidoboizoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenidty. (See, e.g., 
1 5 Ausubd, supra .) Rabbits are immunized with the pqitide-KLH complex in complete Freund"^ 

adjuvant. Resulting antisera are tested for antipq)tide activity by, for example, binding the peptide to 
plastic, bloddng with 1 % BSA, reacting with rabbit antisera, washing, and reacting with radio- 
iodinated goat anti-rabbit IgG. Antisera with antipq)tide activity are tested for anti*DITHP activity 
using protocols wdl ioiown in the art, including ELISA, RIA, and immunoblotting. 

20 

XYII. Purification of Naturally Occurring DITHP Using Specific Antibodies 

Naturally occurring or recombinant DITHP is substantially purified by immunoaffinity 

chromatography using antibodies specific for DITHP. An immunoaffinity colunm is constructed by 

coval^tly coupling anti-DITHP antibody to an activated chromatographic resin, such as 
2 5 CNBr-activaled SEPH AROSE ( Amersham Pharmada Biotech). After the coupling, the resin is 

blocked and washed according to the manufacturer's instructions. 

Media containing DITHP are passed over the immunoaffinity column, and the column is 

washed undo* conditions that allow the preferential absorbance of DITHP (e.g., high ionic strength 

buffers in the presence of detergent). The column is duted under conditions that disrupt 
30 antibody/DITHP binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such 

as urea or thiocyanate ion), and DITHP is collected . 

XVIIL Identification of Molecules Which Interact witti DITHP 

DITHP, or biologically active fragments thereof, are labded with '^I Bolton-Hunter reagent 
35 (See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochan. J. 133:529-539.) Candidate molecules 
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previously arrayed in the wdls of a multi-wdl plate are incubated with the labeled DUHP, washed, and 
any weDs with labded DITHP con;)lex are assayed. Data obtained using different concentrations of 
DITHP are used to calculate values for the number, affinity, and association of DITHP with the 
candidate molecules. 

5 Altemativdy, molecules interacting with DITHP are analyzed using the yeast two-hybrid 

system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially 
available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH). 

DITHP may also be used in the PATHCALLING process (CuraGen Cwp., New Haven CT) 
which »q)loys the yeast two-hybrid system in a high-througtput manner to deto'mine all interactions 
10 between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. Patoit 
No. 6,057.101). 

All publications and patents mentioned in the above specification are herdn incorporated by 
reference. Various modifications and variations of the described method and syst^ of the invention 

15 will be apparent to those skilled in the art without departing from the scope and spirit of the invention. 
Although the invention has been described in connection with specific preferred embodiments, it should 
be understood that the invention as claimed should not be unduly limited to such specific embodiments. 
Indeed, various modifications of the above-described modes for carrying out the invention which are 
obvious to those skilled in the field of molecular biology or related fidds are int^ded to be within the 

2 0 scope of the following claims. . 
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SEQIDNO: Template ID 



2 


404508.3.j 


2 


404508.3.j 


2 


404508.3.J 


2 


404508.3.j 


2 


404508.3.] 


3 


441227.2.j 


6 


13039.2 


7 


238005.4 


7 


238005.4 


7 


238005.4 


7 


238005.4 


9 


348094.6.J 


9 


348094.6.J 


9 


348094.6.j 


9 


348094.6.j 


9 


348094.6.j 


9 


348094.6.j 


10 


233678.1 


11 


312243.1 


16 


346078.5.j 


17 


394637.1. J 


18 


222429.3 


19 


366739.2 


19 


366739.2 


20 


474635.6 


22 


407090.5.j 


24 


411449.2 


27 


445433.2 


29 


257121.2 


29 


257121.2 


46 


338992.1 


46 


338992.1 



TABLES 



Start 


Stop 


Frame 


468 


548 


reverse 3 


549 


626 


forward 3 


348 


428 


reverse 3 


430 


513 


reverse 1 


459 


539 


forv/ard 3 


523 


606 


forward 1 


1507 


1581 


forward 1 


887 


973 


forward 2 


1691 


1783 


forward 2 


1583 


1678 


forward 2 


566 


643 


forward 2 


1994 


2077 


forward 2 


1871 


1948 


reverse 2 


2046 


2117 


forward 3 


21 12 


2195 


reverse 3 


2035 


2115 


forward 1 


2920 


3000 


forward 1 


2412 


2495 


forward 3 


528 


614 


forward 3 


1406 


1492 


forward 2 


36 


122 


forward 3 


646 


726 


forward 1 


791 


880 


forward 2 


1116 


1207 


forward 2 


1293 


1370 


forward 3 


2955 


3038 


reverse 3 


283 


366 


forward 1 


123 


203 


forward 3 


3656 


3736 


forward 2 


851 


934 


forward 2 


1601 


1690 


forward 2 


1478 


1561 


forward 2 



Domain Type 
TM 
SP 
TM 
SP 
TM 
SP 
TM 
SP 
SP 
SP 
TM 
TM 
TM 
TM 
TM 
TM 
TM 
SP 
SP 
TM 
TM 
SP 
SP 
SP 
TM 
SP 
SP 
TM 
TM 
TM 
SP 
SP 
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TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


4 


277927.2 


2706884H1 


1 


199 


4 


277927.2 


4317170F6 


1 


384 


4 


277927.2 


2706884F6 


1 


320 


4 


277927.2 


gl062337 


36 


388 


4 


277927.2 


gl064195 


39 


230 


4 


277927.2 


2862653H1 


42 


117 


4 


277927.2 


494376H1 


53 


305 


4 


277927.2 


494376R7 


53 


383 


4 


277927.2 


494376R6 


53 


492 


4 


277927.2 


g1276473 


57 


586 


4 


277927.2 


92055898 


69 


643 


4 


277927.2 


2707487H1 


64 


162 


4 


277927.2 


491 071 HI 


74 


319 


4 


277927.2 


Q1975146 


80 


403 


>i 


277927.2 


91142488 


84 


471 


4 


277927.2 


91839772 


84 


214 


4 


277927.2 


91012520 


84 


292 


4 


277927.2 


3396290H1 


86 


344 


4 


277927.2 


6896091 HI 


246 


523 


4 


277927.2 


9946240 


246 


551 


4 


277927.2 


9943428 


247 


449 


6 


476311.1 


1948746H1 


1 


233 


5 


475311.1 


9614088 


12 


271 


5 


475311.1 


5644994H1 


32 


287 


5 


475311.1 


9645077 


37 


260 


5 


475311.1 


9668814 


40 


366 


5 


475311.1 


9645713 


42 


263 


5 


475311.1 


9645045 


42 


359 


5 


476311.1 


4297187H1 


46 


315 


5 


475311.1 


3486028H1 


63 


386 


5 


476311.1 


9683448 


64 


438 


5 


475311.1 


9645139 


54 


252 


5 


475311.1 


9645138 


54 


316 


5 


475311.1 


9670408 


54 


433 


5 


476311.1 


9646240 


66 


444 


6 


475311.1 


9562499 


56 


444 


5 


475311.1 


94113162 


56 ' 


511 


5 


475311.1 


92003099 


69 


473 


5 


475311.1 


9815985 


71 


481 


5 


475311.1 


9674342 


77 


421 


5 


475311.1 


4161149H1 


88 


355 


5 


475311.1 


3843963H1 


88 


393 


5 


475311.1 


5029373H1 


92 


346 


5 


475311.1 


3584334H1 


98 


435 


5 


475311.1 


93016753 


105 


441 


5 


475311.1 


9831378 


106 


457 


5 


475311.1 


9900504 


105 


401 


5 


475311.1 


9817737 


105 


402 


5 


475311.1 


9817003 


106 


418 


5 


475311.1 


9812268 


106 


397 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


5 


476311.1 


g2052991 


105 


462 


5 


475311.1 


g3922575 


105 


509 


6 


476311.1 


g2884797 


105 


548 


5 


476311.1 


92354824 


105 


375 


5 


475311.1 


92788503 


105 


214 


5 


475311.1 


9797560 


105 


342 


5 


476311.1 


9820611 


105 


346 


5 


476311.1 


9821223 


IDS 


347 


5 


475311.1 


g1678742 


138 


543 


5 


476311.1 


5590296H1 


150 


396 


5 


476311.1 


2446939H1 


165 


407 


5 


475311.1 


4178404H1 


164 


434 


5 


476311.1 


4970343H1 


179 


455 


5 


475311.1 


gl 720508 


218 


502 


5 


476311.1 


481608SH1 


242 


326 


5 


475311.1 


gl 2961 72 


248 


789 


5 


475311.1 


5778021H1 


256 


544 


5 


475311.1 


1579546F6 


263 


712 


6 


475311.1 


1227469H1 


263 


535 


5 


475311.1 


1951396H1 


287 


527 


5 


475311.1 


gl980791 


307 


631 


5 


475311.1 


3369048H1 


322 


455 


5 


475311.1 


261669H1 


325 


559 


5 


476311.1 


1005656H1 


344 


651 


5 


475311.1 


gl987083 


364 


686 


5 


475311.1 


3163327H1 


364 


683 


5 


475311.1 


5086472H1 


372 


552 


5 


475311.1 


92038507 


365 


782 


5 


475311.1 


3791124H1 


366 


623 


5 


476311.1 


3342191H1 


383 


665 


5 


475311.1 


2727001 HI 


403 


658 


S 


475311.1 


6691335H1 


42} 


697 


5 


475311.1 


22461 68H1 


434 


702 


5 


475311.1 


4186012H1 


582 


933 


5 


475311.1 


2630947H1 


653 


899 


5 


475311.1 


gl444007 


793 


1005 


6 


13039.2 


4189083H1 


3573 


3883 


6 


13039.2 


93802900 


3582 


3888 


6 


13039.2 


2881380H1 


3588 


3897 


6 


13039.2 


9864431 


3590 


3852 


6 


13039.2 


91481926 


3594 


3881 


6 


13039.2 


9667472 


3600 


3883 


6 


13039.2 


93154327 


3607 


3891 


6 


13039.2 


4583956H1 


3611 


3880 


6 


13039.2 


9776778 


3615 


3887 


6 


13039.2 


2381092H1 


3622 


3854 


6 


13039.2 


3810284H1 


3623 


3895 


6 


13039.2 


4906362H2 


3^4 


3904 


6 


13039.2 


2202368H1 


3632 


3875 


6 


13039.2 


94072852 


3636 


4070 
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TABLE 4 



SEQIDNQ 


Template ID 


Component ID 


Start 


Stop 


6 


13039.2 


2561132H1 


3646 


3748 


6 


13039.2 


1294323F1 


3648 


3884 


6 


13039.2 


1294323H1 


3648 


3780 


6 


13039.2 


446234H1 


3649 


3871 


6 


13039.2 


4774923H1 


3650 


3935 


6 


13039.2 


9789650 


3651 


3882 


6 


13039.2 


g1220073 


3652 


3888 


6 


13039.2 


9776622 


3658 


3886 


6 


13039.2 


gl522080 


3663 


3883 


6 


13039.2 


449477H1 


3667 


3875 


6 


13039.2 


2293709H1 


3669 


3864 


6 


13039.2 


glll49% 


3695 


3885 


6 


13039.2 


9667182 


3700 


3883 


6 


13039.2 


91162269 


3703 


4063 


6 


13039.2 


g1 773892 


3703 


4081 


6 


13039.2 


93887238 


3705 


4083 


6 


13039.2 


3542520H1 


3707 


3870 


6 


13039.2 


4633040H1 


3727 


4003 


6 


13039.2 


91128515 


3763 


4081 


6 


13039.2 


2804481 HI 


3793 


3899 


6 


13039.2 


556467R6 


3816 


3871 


6 


13039.2 


656467H1 


3816 


3871 


6 


13039.2 


9683155 


3819 


4075 


6 


13039.2 


9878104 


3824 


4075 


6 


13039.2 


2945643H1 


3841 


4076 


6 


13039.2 


9821501 


3869 


4087 


6 


13039.2 


9782508 


3869 


4083 


6 


13039.2 


2590326H2 


4016 


4075 


6 


13039.2 


4535971T6 


1081 


1596 


6 


13039.2 


2887511 HI 


1137 


1386 


6 


13039.2 


5060591H1 


1078 


1278 


6 


13039.2 


2261086H1 


1154 


1385 


6 


13039.2 


3535673H1 


1214 


1493 


6 


13039.2 


1951291H1 


1264 


1453 


6 


13039.2 


488798H1 


1330 


1587 


6 


13039.2 


2634402H1 


1381 


1594 


6 


13039.2 


2135368H1 


1421 


1684 


6 


13039.2 


2135368F6 


1421 


1863 


6 


13039.2 


2837860T6 


3920 


4029 


6 


13039.2 


2837860F6 


3927 


4069 


6 


13039.2 


2837860H1 


3927 


4069 


6 


13039.2 


3536266H1 


3936 


4027 


6 


13039.2 


3536268H1 


3937 


4038 


6 


13039.2 


9776777 


3055 


3279 


6 


13039.2 


91856268 


3100 


3505 


6 


13039.2 


2604507H1 


3110 


3384 


6 


13039.2 


5065865H1 


3071 


3344 


6 


13039.2 


3472443H1 


3077 


3330 


6 


13039.2 


5017370H1 


3114 


3384 


6 


13039.2 


2767212H1 


3090 


3334 
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TABLE 4 



(ID NO: 


Template ID 


Component ID 


Start 


6 


13039.2 


7901 29R1 


3121 


6 


13039.2 


2690376H1 


3094 


6 


13039.2 


2411049H1 


3099 


6 


13039.2 


gl976982 


1594 


6 


13039.2 


2642416H1 


1599 


6 


13039.2 


gl476955 


1606 


6 


13039.2 


g1062967 


1606 


6 


13039.2 


gl481925 


1606 


6 


13039.2 


3136852H1 


1738 


6 


13039.2 


2500241 HI 


1778 


6 


13039.2 


2788782H1 


1863 


6 


13039.2 


874642H1 


1862 


6 


13039.2 


g1 200951 


1869 


6 


13039.2 


453S987H1 


1879 


6 


13G39.2 


4536041 H1 


1864 


6 


13039.2 


1921318T6 


3558 


6 


13039.2 


g2458968 


3562 


6 


13039.2 


3434402H1 


1 


6 


13039.2 


gl774870 


25 


6 


13039.2 


6090377H1 


28 


6 


13039.2 


5090377F6 


28 


6 


13039.2 


g2243533 


71 


6 


13039.2 


4535971 HI 


377 


6 


13039.2 


4635971 F6 


377 


6 


13039.2 


2738090H1 


634 


6 


13039.2 


2738090F6 


634 


6 


13039.2 


4032617H1 


652 


6 


13039.2 


3247382H1 


687 


6 


13039.2 


28861 24H1 


2974 


6 


13039.2 


28861 34H1 


2974 


6 


13039.2 


2882975H1 


2974 


6 


13039.2 


731806R1 


2984 


6 


13039.2 


3715202H1 


2984 


6 


13039.2 


gl993687 


2985 


6 


13039.2 


731806H1 


2984 


6 


13039.2 


2468144H1 


2990 


6 


13039.2 


1907815H1 


2991 


6 


13039.2 


4148570H1 


3006 


6 


13039.2 


g778804 


3016 


6 


13039.2 


3256396H1 


3017 


6 


13039.2 


g789649 


3021 


6 


13039.2 


4983180H1 


3028 


6 


13039.2 


g865926 


3030 


6 


13039.2 


4983190H1 


3028 


6 


13039.2 


998221 R1 


3039 


6 


13039.2 


998221 HI 


3039 


6 


13039.2 


4650828H1 


3041 


6 


13039.2 


g776667 


3055 


6 


13039.2 


3699551H1 


769 


6 


13039.2 


27841 37H1 


768 



Stop 

3597 

3357 

3334 

1966 

1728 

2005 

1901 

1784 

2037 

2029 

2121 

2024 

2062 

2008 

2151 

4028 

3880 

237 

371 

270 

341 

456 

641 

900 

871 

1162 

788 

966 

3263 

3255 

3267 

3499 

3092 

3312 

3258 

3229 

3218 

3269 

3330 

3269 

3255 

3311 

3351 

3302 

3594 

3328 

3328 

3330 

1043 

999 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


6 


13039.2 


60921 74H1 


863 


1136 


6 


13039.2 


2736123H1 


901 


1143 


6 


13039.2 


2736123F6 


901 


1374 


6 


13039.2 


2841123H2 


964 


1052 


6 


13039.2 


g1801914 


984 


1330 


6 


13039.2 


3627415H1 


990 


1242 


6 


13039.2 


2049675H1 


1064 


1318 


6 


13039.2 


2783785F6 


1067 


1509 


6 


13039.2 


2783785H1 


1067 


1319 


6 


13039.2 


4764557H1 


2700 


2895 


6 


13039.2 


g862228 


2711 


2935 


6 


13039.2 


4977406H1 


2718 


2987 


6 


13039.2 


4324144H1 


2725 


3007 


6 


13039.2 


5195156H1 


2730 


2831 


6 


13039.2 


4625438H1 


2738 


2992 


6 


13039.2 


2881531H1 


2740 


3023 


6 


13039.2 


1545834H1 


2740 


2953 


6 


13039.2 


3214807H1 


2740 


2990 


6 


13039.2 


1312356H1 


2750 


2954 


6 


13039.2 


1312307H1 


2750 


2979 


6 


13039.2 


g 1238200 


3569 


3881 


6 


13039.2 


g792128 


3570 


3881 


6 


13039.2 


g3109745 


3571 


3878 


6 


13039.2 


g789837 


2241 


2520 


6 


13039.2 


g666844 


2248 


2506 


6 


13039.2 


g793332 


2260 


2481 


6 


13039.2 


g862227 


2260 


2522 


6 


13039.2 


3321110H1 


2293 


2570 


6 


13039.2 


gl636044 


2306 


2530 


6 


13039.2 


3738658H1 


2316 


2611 


6 


13039.2 


5700029H1 


2317 


2575 


6 


13039.2 


5030623H1 


2361 


2636 


6 


13039.2 


3321569H2 


2364 


2614 


6 


13039.2 


3484339H1 


2365 


2701 


6 


13039.2 


g317338 


2375 


2655 


6 


13039.2 


4922854H1 


2377 


2683 


6 


13039.2 


1382370H1 


2377 


2615 


6 


13039.2 


3364673H1 


2393 


2653 


6 


13039.2 


401 4451 HI 


2393 


2645 


6 


13039.2 


42961 68H1 


2429 


2676 


6 


13039.2 


4296144H1 


2429 


2683 


6 


13039.2 


4295505H1 


2429 


2696 


6 


13039.2 


4775793H1 


2466 


2741 


6 


13039.2 


6190065H1 


1477 


1620 


6 


13039.2 


g2013579 


1518 


1807 


6 


13039.2 


3201887H1 


1570 


1835 


6 ■ 


13039.2 


g708914 


1436 


1737 


6 


13039.2 


g691657 


1479 


1801 


6 


13039.2 


489678H1 


1579 


1818 


6 


13039.2 


044642H1 


2172 


2328 
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PCTAJS00/1S404 



TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


6 


13039.2 


3129828H1 


2163 


2472 


6 


13039.2 


5702193H1 


2183 


2423 


6 


13039.2 


g866587 


2185 


2551 


6 


13039.2 


4968731H1 


2217 


2474 


6 


13039.2 


4969450H1 


2217 


2498 


6 


13039.2 


g434208 


2221 


2510 


6 


13039.2 


g666222 


3533 


3878 


6 


13039.2 


1861085T6 


3554 


3841 


6 


13039.2 


g 1476956 


3554 


3883 


6 


13039.2 


2472134H1 


1917 


2158 


6 


13039.2 


2460193H1 


1973 


2223 


6 


13039.2 


5544753H1 


1977 


2125 


6 


13039.2 


1894461H1 


1994 


2224 


6 


13039.2 


4989013H1 


2004 


2267 


6 


13039.2 


2504842H1 


2018 


2250 


6 


13039.2 


3817686H1 


2039 


2289 


6 


13039.2 


1709583F6 


2086 


2637 


6 


13039.2 


1709683H1 


2086 


2292 


6 


13039.2 


5496588H1 


2117 


2221 


6 


13039.2 


g618254 


2133 


2456 


6 


13039.2 


2302357H2 


2134 


2358 


6 


13039.2 


3076667H1 


2142 


2408 


6 


13039.2 


g4096030 


3440 


3883 


6 


13039.2 


g4394182 


3444 


3889 


6 


13039.2 


g3298631 


3450 


3883 


6 


13039.2 


g4152977 


3452 


3888 


6 


13039.2 


4150667H1 


3456 


3752 


6 


13039.2 


5036313H1 


3454 


3744 


6 


13039.2 


g41 10089 


3455 


3887 


6 X 


13039.2 


g31 10528 


3467 


3881 


6 


13039.2 


g3884368 


3468 


3683 


6 


13039.2 


g2631703 


3472 


3882 


6 


13039.2 


g4329108 


3483 


3871 


6 


13039.2 


g4307404 


3484 


3891 


6 


13039.2 


g3887724 


3487 


3890 


6 


13039.2 


5679544H1 


3488 


3784 


6 


13039.2 


4643394H1 


3488 


3741 


6 


13039.2 


231550R1 


3496 


3891 


6 


13039.2 


231650H1 


3496 


3719 


6 


13039.2 


g2659034 


3504 


3678 


6 


13039.2 


3635404H1 


3504 


3814 


6 


13039.2 


27361 23T6 


3615 


3839 


6 


13039.2 


231550F1 


3523 


3882 


6 


13039.2 


g3596402 


3524 


3881 


6 


13039.2 


273621 7H1 


3526 


3796 


6 


13039.2 


2368391 H2 


3323 


3556 


6 


13039.2 


4761918H1 


3324 


3582 


6 


13039.2 


3162730H1 


3331 


3637 


6 


13039.2 


2911248H1 


3338 


3625 


6 


13039.2 


780353H1 


3345 


3665 
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TABLE 4 



SEQ ID NQ 


T mpldtelD 


Component ID 


Start 


Slop 


6 


13039.2 


1233680H1 


3349 


3681 


6 


13039.2 


1233680F1 


3349 


3971 


6 


13039.2 


3792837H1 


3351 


3635 


6 


13039.2 


92357493 


3351 


3602 


6 


13039.2 


2005695H1 


3354 


3432 


6 


13039.2 


93087116 


3355 


3891 


6 


13039.2 


3038679H1 


3359 


3631 


6 


13039.2 


731806F1 


3361 


3882 


6 


13039.2 


93070256 


3362 


3878 


6 . 


13039.2 


93049253 


3364 


3885 


6 


13039.2 


93049251 


3366 


3885 


6 


13039.2 


93070255 


3369 


3878 


6 


13039.2 


93076969 


3370 


3878 


6 


13039.2 


94292218 


3374 


3875 


6 


13039.2 


2379289H1 


3376 


3605 


6 


13039.2 


6026190H1 


3375 


3685 


6 


13039.2 


2887871H1 


3378 


3685 


6 


13039.2 


93841328 


3384 


3878 


6 


13039,2 


94073143 


3385 


3881 


6 


13039.2 


602339H1 


3386 


3668 


6 


13039.2 


92901329 


3390 


3885 


6 


13039.2 


94153184 


3401 


3882 


6 


13039.2 


1862447T6 


3417 


3836 


6 


13039.2 


2738090T6 


3429 


3840 


6 


13039.2 


1709583T6 


3431 


3837 


6 


13039.2 


2870017H1 


3229 


3528 


6 


13039.2 


2859049H1 


3229 


3512 


6 


13039.2 


1822377H1 


3237 


3499 


6 


13039.2 


3601353H1 


3245 


3577 


6 


13039.2 


27601 17H1 


3247 


3543 


6 


13039.2 


2751935H1 


3247 


3520 


6 


13039.2 


1921318H1 


3255 


3535 


6 


13039.2 


1921318R6 


3255 


3696 


6 


13039.2 


4797434H1 


3262 


3539 


6 


13039.2 


9778894 


3266 


3611 


6 


13039.2 


9855990 


3266 


3591 


6 


13039.2 


2135368T6 


3266 


3840 


6 


13039.2 


3975369H1 


3275 


3387 


6 


13039.2 


3442784H1 


3283 


3547 


6 


13039.2 


1832656H1 


3299 


3596 


6 


13039.2 


2219316H1 


3317 


3673 


6 


13039.2 


476191 1H1 


3323 


3598 


6 


13039.2 


91238488 


3188 


3379 


6 


13039.2 


9877900 


3195 


3513 


6 


13039.2 


92148718 


3196 


3847 


6 


13039.2 


1736290H1 


3199 


3439 


6 


13039.2 


1734801 HI 


3199 


3448 


6 


13039.2 


1734817H1 


3199 


3447 


6 


13039.2 


3812780H1 


3201 


3506 


6 


13039.2 


38141 17H1 


3202 


3367 
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wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


6 


13039.2 


2404648H1 


3213 


3450 


6 


13039.2 


1736463H1 


321S 


3509 


6 


13039.2 


S019558H1 


3217 


3470 


6 


13039.2 


5559286H1 


3217 


3498 


6 


13039.2 


793056H1 


3123 


3357 


6 


13039.2 


7901 29H1 


3134 


3376 


6 


13039.2 


2671569H1 


3169 


3272 


6 


13039.2 


42551 30H1 


3176 


3267 


6 


13039.2 


4164769H1 


3170 


3434 


6 


13039.2 


42551 76H1 


3175 


3459 


6 


13039.2 


1861085F6 


3562 


3882 


6 


13039.2 


g 1203064 


3563 


3884 


6 


13039.2 


1861177H1 


3563 


3888 


6 


13039.2 


2803220H1 


2S25 


2792 


6 


13039.2 


48798D7H1 


2476 


2740 


6 


13039.2 


1627449H1 


2527 


2661 


6 


13039.2 


2660611 HI 


2481 


2734 


6 


13039.2 


3449673H1 


2502 


2757 


6 


13039.2 


5039738H2 


2535 


2749 


6 


13039.2 


4595615H1 


2538 


2813 


6 


13039.2 


1862447F6 


2566 


3056 


6 


13039.2 


1862447H1 


2566 


2831 


6 


13039.2 


4251362H1 


2587 


2852 


6 


13039.2 


g864534 


2601 


2918 


6 


13039.2 


91949105 


2608 


2883 


6 


13039.2 


4831252H1 


2612 


2740 


6 


13039.2 


4830165H1 


2612 


2824 


6 


13039.2 


91522198 


2629 


2917 


6 


13039.2 


2962002H1 


2642 


2938 


6 


13039.2 


4111111H1 


2668 


2941 


6 


13039.2 


45818ieH1 


2674 


2970 


6 


13039.2 


3480168H1 


2674 


2966 


6 


13039.2 


059851 HI 


2688 


2875 


6 


13039.2 


3463785H1 


2827 


3087 


6 


13039.2 


1379162H1 


2833 


3074 


6 


13039.2 


827278H1 


2833 


3141 


6 


13039.2 


1379162F1 


2833 


3380 


6 


13039.2 


827278R1 


2833 


3428 


6 


13039.2 


4970664H1 


2852 


3138 


6 


13039.2 


3765903H1 


2856 


3173 


6 


13039.2 


9828803 


2876 


3107 


6 


13039.2 


2912660H1 


2876 


3139 


6 


13039.2 


294501 IHl 


2929 


3231 


6 


13039.2 


1900343H1 


2930 


3184 


6 


13039.2 


4177622H1 


2939 


3215 


6 


13039.2 


2714979H1 


2941 


3205 


6 


13039.2 


5098921 HI 


2946 


3241 


6 


13039.2 


4589878H1 


2963 


3194 


6 


13039.2 


1350496H1 


2966 


3227 


6 


13039.2 


1350496F1 


2966 


3639 



182 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


6 


13039.2 


g793050 


2751 


. 2936 


6 


13039.2 


2396484H1 


2751 


2985 


6 


13039.2 


3677377H1 


2764 


2931 


6 


13039.2 


1613610H1 


2765 


2976 


6 


13039.2 


3677367H1 


2766 


3072 


6 


13039.2 


3813522H1 


2772 


3037 


6 


13039.2 


4688970H1 


2782 


3032 


6 


13039.2 


4889047H1 


2811 


3096 


6 


13039.2 


4550845H1 


2811 


2992 


6 


13039.2 


gl986776 


2811 


3068 


6 


13039.2 


2943108H1 


2816 


3136 


6 


13039.2 


919863H1 


2816 


3094 


6 


13039.2 


2220586H1 


2818 


3075 


7 


238005.4 


3494842H1 


1 


146 


/ 


238005,4 


r6i3955Hl 


19 


211 


7 


238005.4 


1513931H1 


19 


189 


7 


238005.4 


1895354H1 


24 


255 


7 


238005.4 


3216547H1 


53 


287 


7 


238005.4 


2492948H1 


210 


525 


7 


238005.4 


3393775H1 


226 


487 


7 


238005.4 


3110735H1 


237 


520 


7 


238005.4 


2114053H1 


238 


500 


7 


238005.4 


2639154H1 


229 


480 


7 


238005.4 


26391 54F6 


229 


615 


7 


238005.4 


3489530H1 


251 


540 


7 


238005.4 


2452873H1 


253 


499 


7 


238005.4 


2452873F6 


253 


419 


7 


238005.4 


3149681H1 


258 


499 


7 


238005.4 


2952379H1 


265 


542 


7 


238005.4 


2466034H1 


291 


507 


7 


238005.4 


3074678H1 


330 


611 


7 


238005.4 


g2003143 


378 


830 


7 


238005.4 


2310518R6 


454 


935 


7 


238005.4 


2310518H1 


454 


735 


7 


238005.4 


246681 8H1 


500 


732 


7 


238005,4 


3243701 HI 


503 


764 


7 


238005.4 


3108380H1 


506 


773 


7 


238005.4 


4250192H1 


506 


710 


7 


238005.4 


3688838H1 


527 


810 


7 


238005.4 


2312916H1 


543 


816 




238005.4 


4087821H1 


570 


878 




238005.4 


93173776 


593 


909 




238005.4 


3960672H2 


599 


874 




238005.4 


1830753H1 


644 


897 




238005.4 


3936919H1 


690 


871 




238005.4 


3183381H1 


691 


942 




238005.4 


3165765H1 


714 


1037 




238005.4 


3231218H1 


762 


1012 




238005.4 


5273755H1 


763 


1032 




238005.4 


4710712H1 


876 


997 



183 



wo 00/73509 



TABLE 4 

SEQIDNO: Template ID Component ID 

7 238005.4 g847384 

7 238005.4 5610260H1 

7 238005.4 2046234F6 

7 238005.4 4547319H1 

7 238005.4 5302724H1 

7 238005.4 5267284H1 

7 238005.4 4862949H1 

7 238005.4 2893053H1 

7 238005.4 2893053F6 

7 238005.4 3408366H1 

7 238005.4 5427753H1 

7 238005.4 2046234H1 

7 238005.4 g846687 

7 238005.4 1978835H1 

7 238GG5.4 5187046Hi 

7 238005.4 5049433F6 

7 238005.4 5069619H1 

7 238005.4 079996H1 

7 238005.4 2755107H1 

7 238005.4 2994580H1 

7 238005.4 1357968H1 

7 238005.4 2486058H1 

7 238005.4 3155023H1 

7 238005.4 1405122H1 

7 238005.4 074212H1 

7 238005.4 g 1775652 

7 238005.4 5426560H1 

7 238005.4 2452873T6 

7 238005.4 6597619H1 

7 238005.4 2640268T6 

7 238005.4 2971688H1 

7 238005.4 2310518T6 

7 238005.4 522334H1 

7 238005.4 478791 3H1 

7 238005.4 6093903H1 

7 238005.4 3716621H1 

7 238005.4 3435261 HI 

7 238005.4 3256708H1 

7 238005.4 769094H1 

7 238005.4 g2526289 

7 238005.4 4974673H1 

7 238005.4 3870931H1 

7 238005.4 4701192H1 

7 238005.4 g4113880 

7 238005.4 570351 3H1 

7 238005.4 5867713H1 

7 238005.4 g3430405 

7 238005.4 g4304488 

7 238005.4 g3446639 

7 238005.4 g3843850 



PCTAJSOO/15404 



Start 


Stop 


989 


1313 


999 


1303 


1017 


1417 


1024 


1320 


1041 


1284 


1053 


1283 


1072 


1245 


1078 


1355 


1078 


1449 


1120 


1378 


1132 


1350 


1135 


1417 


1153 


1408 


1173 


1431 


1212 


1303 


1212 


1740 


1213 


1467 


1221 


1447 


1244 


1546 


1250 


1546 


1253 


1465 


1290 


1528 


1308 


1607 


1358 


1634 


1360 


1547 


1362 


1565 


1368 


1643 


1376 


1902 


1385 


1634 


1394 


1890 


1399 


1715 


1425 


1902 


1421 


1673 


1436 


1716 


1439 


1775 


1449 


1751 


1451 


1718 


1454 


1740 


1454 


1685 


1458 


1946 


1461 


1728 


1471 


1787 


1471 


1746 


1481 


1936 


1508 


1795 


1510 


1740 


1520 


1936 


1527 


1925 


1532 


1952 


1534 


1956 



184 
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TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


7 


238005.4 


5049433T6 


1538 


1924 


7 


238005.4 


93895204 


1538 


1945 


7 


238005.4 


3343709T6 


1541 


1905 


7 


238005.4 


94267940 


1548 


1946 


7 


238005.4 


2269166H1 


1556 


1832 


7 


238005.4 


d211050H1 


1556 


1795 


7 


238005.4 


3858314H1 


1561 


1865 


7 


238005.4 


3868359H1 


1561 


1845 


7 


238005.4 


3861114H1 


1561 


1868 


7 


238005.4 


4215258H1 


1567 


1843 


7 


238005.4 


gl 647982 


1573 


1944 


7 


238005.4 


6106957H1 


1583 


1903 


7 


238005.4 


4974030H1 


1586 


1853 


7 


238005.4 


9518289 


1588 


1946 


7 


238005.4 


92526290 


1592 


1945 


7 


238005.4 


4562130H1 


1603 


1871 


7 


238005.4 


6093354H1 


1607 


1891 


7 


238005.4 


26n577Hl 


1612 


1852 


7 


238005.4 


92003142 


1617 


1950 


7 


238005.4 


92213309 


1628 


1935 


7 


238005.4 


91893696 


1633 


1960 


7 


238005,4 


2759732H1 


1636 


1924 


7 


238005.4 


94243121 


1640 


1944 


7 


238005.4 


94108171 


1645 


1945 


7 


238005.4 


93765581 


1650 


1946 


7 


238005.4 


3027260T6 


1677 


1899 


7 


238005.4 


1213283H1 


1736 


1875 


7 


238005.4 


9846910 


1739 


1936 


7 


238005.4 


2753427H1 


1746 


1944 


7 


238005.4 


2150089H1 


1766 


1944 


7 


238005.4 


91775764 


1824 


1936 


7 


238005.4 


5288457H1 


1860 


1978 


7 


238005.4 


94285357 


1863 


1946 


10 


233678.1 


1803848F6 


1839 


2351 


10 


233678.1 


2355751 HI 


1855 


1998 


10 


233678.1 


1923122H1 


1895 


2186 


10 


233678.1 


1858559H1 


1896 


2188 


10 


233678.1 


2949425H1 


1896 


2193 


10 


233678.1 


1858559F6 


1896 


2326 


10 


233678.1 


45421 83H1 


1896 


2166 


10 


233678.1 


3786222H1 


1898 


2215 


10 


233678.1 


6091924H1 


1943 


2212 


10 


233678.1 


4584888H1 


1958 


2251 


10 


233678.1 


1651475H1 


1991 


2242 


10 


233678.1 


3967280H1 


2006 


2242 


10 


233678.1 


5623384H1 


2022 


2351 


10 


233678.1 


5623284H1 


2022 


2314 


10 


233678.1 


4948649H1 


2031 


2325 


10 


233678.1 


1521957H1 


2030 


2222 


10 


233678.1 


1726538H1 


2034 


2236 



18S 



wo 00/73509 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


10 


233678.1 


1723946H1 


2034 


2252 


10 


233678.1 


1605814H1 


2037 


2174 


10 


233678.1 


5554120H1 


2037 


2321 


10 


233678.1 


1874743F6 


2056 


2410 


10 


233678.1 


1874743H1 


2057 


2352 


10 


233678.1 


4857286H1 


2097 


2352 


10 


233678.1 


1509348H1 


2102 


2301 


10 


233678.1 


2242767H1 


2114 


2370 


10 


233678.1 


2848660H1 


2126 


2428 


10 


233678.1 


2666671H1 


2129 


2380 


10 


233678.1 


1576284H1 


2142 


2375 


10 


233678.1 


887052R1 


2142 


2769 


10 


233678.1 


887052H1 


2142 


2447 


10 


233678.1 


3785578H1 


2149 


2345 


10 


233678.1 


3ir)926Hl 


2149 


2473 


10 


233678.1 


766862H1 


2158 


2406 


10 


233678.1 


2227686H1 


2171 


2407 


10 


233678.1 


2013238H1 


2171 


2277 


10 


233678.1 


823920R1 


2176 


2721 


10 


233678.1 


1583169H1 


2176 


2393 


10 


233678.1 


823920H1 


2176 


2328 


10 


233678.1 


1583137H1 


2176 


2390 


10 


233678.1 


gl426492 


2182 


2682 


10 


233678.1 


2275987H1 


2225 


2478 


10 


233678.1 


4730301H1 


2231 


2455 


10 


233678.1 


176909H1 


2231 


2495 


10 


233678.1 


1702710H1 


2230 


2441 


10 


233678.1 


2845122H1 


2231 


2466 


10 


233678.1 


5895659H1 


2245 


2544 


10 


233678.1 


4703091 HI 


2244 


2507 


10 


233678.1 


4214988H1 


2256 


2569 


10 


233678.1 


1622006H1 


2271 


2508 


10 


233678.1 


1260976T6 


2285 


2941 


10 


233678.1 


5874967H1 


2299 


2566 


10 


233678.1 


3291543H1 


2301 


2556 


10 


233678.1 


5874909H1 


2300 


2556 


10 


233678.1 


3291543F6 


2301 


2749 


10 


233678.1 


4588984H1 


2318 


2524 


10 


233678.1 


4564713H1 


2321 


2549 


10 


233678.1 


1607223F6 


2329 


2631 


10 


233678 1 


1607223H1 






10 


233678.1 


777428R1 


2329 


2946 


10 


233678.1 


4196029H1 


2328 


2638 


10 


233678.1 


777428H1 


2329 


2573 


10 


233678.1 


3098571T6 


2334 


2942 


10 


233678.1 


2008649H1 


2355 


2571 


10 


233678.1 


1607223T6 


2360 


•mi 


10 


233678.1 


1803848T6 


2373 


2929 


10 


233678.1 


2210370T6 


2375 


2928 


10 


233678.1 


1701603H1 


2387 


2602 






186 
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TABLE 4 



SEQIDNO: 


Template 10 


Conponent ID 


Start 


Stop 


10 


233678.1 


5286984H1 


2406 


2602 


10 


233678.1 


2355751T6 


2414 


2928 


10 


233678.1 


28SS54H1 


2419 


2806 


10 


233678.1 


285554R6 


2420 


2891 


10 


233678.1 


3289542T6 


2435 


2926 


10 


233678.1 


2518190T6 


2436 


2916 


10 


233678.1 


285554T6 


2452 


2926 


10 


233678.1 


4949921 HI 


2453 


2734 


10 


233678.1 


2814466T6 


2451 


2932 


10 


233678.1 


3291543T6 


2464 


2930 


10 


233678.1 


4197595H1 


2466 


2785 


10 


233678.1 


5105034H1 


2468 


2740 


10 


233678.1 


92963989 


2472 


2972 


10 


233678.1 


92063675 


2474 


2971 


10 


233678.1 


9517590 


2474 


2968 


10 


233678.1 


4950109H1 


2484 


2768 


10 


233678.1 


1874743T6 


2498 


2928 


10 


233678.1 


94107810 


2500 


2968 


10 


233678.1 


94175509 


2507 


2977 


10 


233678.1 


93411743 


2509 


2973 


10 


233678.1 


1858559T6 


2609 


2930 


10 


233678.1 


5848668H1 


2517 


2804 


10 


233678,1 


94269824 


2519 


2968 


10 


233678.1 


3576266T6 


2525 


2934 


10 


233678.1 


26231 96H1 


2526 


2783 


10 


233678.1 


93770184 


2541 


2971 


10 


233678.1 


93923313 


2548 


2973 


10 


233678.1 


32741 47T6 


2551 


2910 


10 


233678.1 


92356198 


2568 


2897 


10 


233678.1 


1684804T6 


2677 


2928 


10 


233678.1 


92784196 


2581 


2968 


10 


233678.1 


323661 8H1 


2583 


2844 


10 


233678.1 


92241760 


2584 


2968 


10 


233678.1 


9316793 


2586 


2979 


10 


233678.1 


92816392 


2588 


2977 


10 


233678.1 


91954535 


2588 


2782 


10 


233678.1 


93700690 


2586 


2968 


10 


233678.1 


91887793 


2590 


2968 


10 


233678.1 


3643238H1 


2593 


2897 


10 


233678.1 


3660438H1 


2594 


2894 


10 


233678.1 


2466533H1 


2612 


2850 


10 


233678.1 


2210370F6 


1632 


2079 


10 


233678.1 


2210367H1 


1632 


1886 


10 


233678.1 


4795286H1 


1659 


1948 


10 


233678.1 


9761628 


1670 


2092 


10 


233678.1 


380321 8H1 


1669 


1902 


10 


233678.1 


4379275H1 


1672 


1949 


10 


233678.1 


1606564H1 


1682 


1912 


10 


233678.1 


5657623H1 


1337 


1607 


10 


233678.1 


2312430H1 


1358 


1602 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


10 


233678.1 


3390808H1 


1274 


1553 


10 


233678.1 


4171822H1 


1390 


1671 


10 


233678.1 


4508866H1 


1437 


1711 


10 


233678.1 


4505362H1 


1437 


1676 


10 


233678.1 


632776H1 


1461 


1683 


10 


233678.1 


2326782H1 


1451 


1677 


10 


233678.1 


1684804H1 


1281 


1507 


10 


233678,1 


1684804F6 


1281 


1743 


10 


233678.1 


3948184H1 


1457 


1585 


10 


233678.1 


3804480H1 


1487 


1706 


10 


233678.1 


5546966H1 


1486 


1671 


10 


233678.1 


5545829H1 


1487 


1700 


10 


233678.1 


3088194H1 


1498 


1736 


10 


233678.1 


3288867H1 


1285 


1546 


TO 


233678.1 


1250975F1 


1577 


2054 


10 


233678.1 


1250975F6 


1577 


2051 


10 


233678.1 


S812225H1 


1317 


1607 


10 


233678.1 


5897566H1 


1330 


1555 


10 


233678.1 


5893318H1 


1330 


1640 


10 


233678.1 


59001 95H1 


1330 


1548 


10 


233678.1 


5897434H1 


1330 


1411 


10 


233678.1 


5898716H1 


1330 


1634 


10 


233678.1 


1250975H1 


1577 


1826 


10 


233678.1 


5396479H1 


1593 


1862 


10 


233678.1 


3797732H1 


1604 


1709 


10 


233678.1 


5542957H1 


1604 


1815 


10 


233678.1 


5558368H1 


1609 


1869 


10 


233678.1 


10022S6H1 


1 


246 


10 


233678.1 


1002266R1 


1 


482 


10 


233678.1 


1399306H1 


74 


326 


10 


233678.1 


1399306F6 


74 


632 


10. 


233678.1 


1894055H1 


182 


422 


10 


233678.1 


1270831F1 


393 


979 


10 


233678.1 


1270831m 


393 


667 


10 


233678.1 


1003419R1 


414 


915 


10 


233678.1 


1003419H1 


414 


608 


10 


233678.1 


2518190F6 


454 


973 


10 


233678.1 


2518190H1 


454 


706 


10 


233678.1 


5659236H1 


474 


628 


10 


233678.1 


3288389H1 


1161 


1413 


10 


233678.1 


5005539H1 


1189 


1264 


10 


233678.1 


5546782H1 


1192 


1398 


10 . 


233678.1 


2809270H1 


1239 


1456 


10 


233678.1 


2814466H1 


762 


1073 


10 


233678.1 


5594064H1 


496 


757 


10 


233678.1 


g764169 


598 


923 


10 


233678.1 


5546549H1 


626 


827 


10 


233678.1 


3289542F6 


652 


1084 


10 


233678.1 


3289542H1 


652 


893 


10 


233678.1 


5544312H1 


687 


896 



188 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


10 


233678.1 


2814466f=6 


762 


1334 


10 


233678.1 


2466560H1 


766 


1014 


10 


233678.1 


2661506H1 


769 


1035 


10 


233678.1 


2441878H1 


884 


988 


10 


233678.1 


2437941 HI 


893 


1114 


10 


233678.1 


4785794H1 


905 


1166 


10 


233678.1 


55741 11 HI 


940 


1157 


10 


233678.1 


3964034H1 


1006 


1286 


10 


233678.1 


4786766H1 


1010 


1266 


10 


233678.1 


650941 HI 


1026 


1285 


10 


233678.1 


4601334H1 


1061 


1307 


10 


233678.1 


3275906H1 


1100 


1352 


10 


233678.1 


32741 47F6 


1101 


1655 


10 


233678.1 


3274147H1 


1101 


1348 


10 


233678. 1 


g3163776 


1145 


1490 


10 


233678.1 


g4244100 


2615 


2969 


10 


233678.1 


04329102 


2618 


2969 


10 


233678.1 


94451360 


2615 


2971 


10 


233678.1 


93802177 


2627 


2971 


10 


233678.1 


93092982 


2628 


2969 


10 


233678.1 


93921970 


2626 


2972 


10 


233678.1 


3516016H1 


2646 


2918 


10 


233678.1 


1377689H1 


2654 


2926 


10 


233678.1 


49861 11 HI 


2666 


2956 


10 


233678.1 


9845966 


2675 


2968 


10 


233678.1 


93777917 


2683 


2968 


10 


233678.1 


9761629 


2702 


2962 


10 


233678.1 


93086793 


2719 


2968 


10 


233678.1 


5273307H1 


2727 


2976 


10 


233678.1 


91139236 


2748 


2972 


10 


233678.1 


4987591H1 


2793 


2957 


10 


233678.1 


4987593H1 


2794 


2968 


10 


233678.1 


2424047H1 


2804 


2968 


10 


233678.1 


6094676H1 


2807 


2968 


10 


233678.1 


3083086H1 


2842 


2968 


10 


233678.1 


37871 62H1 


2863 


2968 


10 


233678.1 


4901030H1 


2902 


2968 


10 


233678.1 


5103514H1 


2904 


2974 


10 


233678.1 


5060557H1 


1720 


2021 


10 


233678.1 


51931 74H1 


1731 


1877 


10 


233678.1 


5003089H1 


1750 


2052 


10 


233678.1 


3937103H1 


1754 


2052 


10 


233678.1 


2210665H1 


1754 


1986 


10 


233678.1 


4951636H2 


1754 


2033 


10 


233678.1 


4589805H1 


1764 


1970 


10 


233678.1 


2738891 HI 


1754 


1967 


10 


233678.1 


4793896H1 


1754 


2005 


10 


233678.1 


1651922H1 


1754 


1961 


10 


233678.1 


3121552H1 


1754 


2049 


10 


233678.1 


1236003H1 


1754 


1975 



189 



wo 00/73509 
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TABLE 4 



SEQtONO: 


Template ID 


Connponent ID 


Start 


Stop 


10 


233678.1 


3361752H1 


1754 


2001 


10 


233678.1 


3282866H1 


1754 


2003 


10 


233678.1 


3114487H1 


1754 


2013 


10 


233678.1 


1652026H1 


1754 


1943 


10 


233678.1 


38041 72H1 


1809 


2145 


10 


233678.1 


92063923 


1820 


2231 


10 


233678.1 


5191775H1 


1834 


1987 


10 


233678.1 


1508970H1 


1833 


2054 


10 


233678.1 


763920H1 


1833 


2139 


10 


233678.1 


1803848H1 


1839 


1940 


10 


233678.1 


2674708H1 


1839 


2112 


11 


312243.1 


4605386H1 


1343 


1593 


11 


312243.1 


4605386F7 


1343 


1784 


11 


312243.1 


1483834H1 


1462 


1744 


11 


312243.1 


4696494H1 


1721 


1986 


11 


312243.1 


1823519H1 


1734 


1965 


11 


312243.1 


1 82351 9F6 


1734 


2013 


11 


312243.1 


1696230H1 


1749 


1855 


11 


312243.1 


1695671H1 


1757 


1986 


n 


312243.1 


1696055H1 


1757 


1969 


11 


312243.1 


1460083H1 


1773 


2007 


11 


312243.1 


1823519T6 


1787 


2427 


11 


312243.1 


3846868H1 


1787 


1988 


11 


312243.1 


386870H1 


1792 


2067 


11 


312243.1 


91515884 


1866 


2193 


11 


312243.1 


2581154F6 


21 


197 


11 


312243.1 


2581154H1 


21 


289 


11 


312243.1 


694181 HI 


26 


222 


11 


312243.1 


3383640H1 


33 


277 


11 


312243.1 


878997H1 


250 


387 


11 


312243.1 


881475H1 


252 


495 


11 


312243.1 


878997R1 


252 


816 


11 


312243.1 


881475R6 


252 


706 


11 


312243.1 


641653R6 


480 


1033 


11 


312243.1 


641653H1 


480 


729 


11 


312243.1 


92055007 


674 


772 


11 


312243.1 


92063527 


684 


806 


11 


312243.1 


1316282H1 


912 


1148 


11 


312243.1 


2120964F6 


1099 


1478 


11 


312243.1 


2120964H1 


1099 


1353 


11 


312243.1 


94186493 


1088 


1442 


11 


312243.1 


94451005 


1162 


1479 


n 


312243.1 


92273834 


1273 


1458 


11 


312243.1 


4605366H1 


1343 


1699 


11 


312243.1 


4605386T7 


1923 


2422 


11 


312243.1 


641653T6 


1947 


2441 


11 


312243.1 


2120964T6 


1957 


2429 


11 


312243.1 


94305745 


2052 


2366 


11 


312243.1 


881475T6 


2069 


2428 


11 


312243.1 


92138940 


2087 


2472 
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Template ID 


Compon ntID 


Start 


Stop 


1 1 


312243.1 


4601955H1 


2114 


2269 


11 


312243.1 


g3897513 


2313 


2467 


1 1 


312243.1 


g3281088 


1 


220 


1 1 


312243.1 


g41 89325 


1 


177 


11 


312243.1 


g34 19223 


1 


219 


11 


312243.1 


3460792H1 


1 


206 


12 


425487.3 


^ Ay A 

g41 87652 


25 


467 


12 


425487.3 


AAA A « AA^ 

g3804148 


48 


413 


12 


Af\F Af\^ r\ 

425487.3 


g4265777 


50 


441 


12 


Anf Ae\^ A 

425487.3 


^ n 1 A ^ A ^ 

gl810525 


25 


243 


12 


425487.3 


g3075563 


51 


536 


12 


425487.3 


g 1576978 


582 


740 


12 


425487.3 


g2276774 




236 


12 


425487.3 


g 1676933 




386 


12 


A^e Af%^ A 

425487.3 


-■^ A ^A «AM 

gioou4ou 




247 


12 


425487.3 


AAAPA A i 

g3835346 




498 


12 


425487.3 


93839178 




474 


12 


425487.3 


A A A P 

g3839255 




504 


12 


425487.3 


gl920365 




417 


12 


425487.3 


g41 13340 


25 


317 


12 


425487.3 


4845742H1 


69 


296 


12 


425487.3 


g2816806 


66 


402 


12 , 


425487.3 


An n ^ A ^ ^ 

g2881811 


75 


519 


12 


425487.3 


. AAA ^ >NA ^ 

g2901201 


84 


461 


12 


425487.3 


g2819800 


80 


494 


12 


425487.3 


A AAA AAA 

g2899484 


91 


540 


12 


425487.3 


A A A AAA A 

g2240223 


266 


660 


12 


425487.3 


PAAAAnm ty 

5639438H1 


531 


769 


18 


222429.3 


A^APAA^I 1^ 

3135386H1 


322 


523 


18 


An^ n 

222429.3 


4193895H1 


745 


1038 


18 


/N/N A ^/\A /\ 

222429.3 


g3917627 


750 


967 


18 


AAA .JA/N A 

222429.3 


4773983H1 


750 


1030 


18 


222429.3 


A« 4AA4AI 14 

3110313H1 


767 


1070 


18 


AAA A 

222429,3 


3818019H1 


322 


637 


18 


AAA ^ A^N A 

222429.3 


5270075H1 


770 


1008 


18 


222429.3 


AAA PA ^ Al 1 f 

228521 OH 1 


324 


583 


^ o 
lo 


222429.3 


o >ic 1 com Ji 

34olooOHl 


783 


1039 


18 


222429.3 


f*yAAAr>At ty 

5690354H1 


788 


1045 


1 o 

lo 


222429.3 


33688o6ril 


323 


607 


18 


222429.3 


Cr.£AA1 AAi 

5692143H1 


796 


1012 


18 


AAA ^/NA /\ 

222429.3 


1695061 HI 


796 


886 


18 


222429.3 


622658H1 


803 


1062 


18 


222429.3 


4264839H1 


814 


968 


18 


222429.3 


4382955H1 


324 


595 


18 


222429.3 


2513281 HI 


325 


584 


18 


222429.3 


g 1395660 


816 


1293 


18 ' 


222429.3 


3182679H1 


325 


653 


18 


222429.3 


g2881072 


817 


1302 


18 


222429.3 


1852753H1 


328 


622 


18 


222429.3 


1852753F6 


329 


803 



191 



wo 00/73509 



PCTA;S00/1S404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


18 


222429.3 


g4073351 


823 


1300 


18 


222429.3 


3619548H1 


332 


599 


18 


222429.3 


1772964H1 


829 


1110 


18 


222429.3 


5016650H1 


835 


1123 


18 


222429.3 


891033R1 


835 


967 


18 


222429.3 


2782761 HI 


341 


620 


18 


222429.3 


4855161H1 


347 


487 


18 


222429.3 


2802335H1 


380 


661 


18- 


222429.3 


4616684H1 


380 


663 


18 


222429.3 


g1395574 


387 


641 


18 


222429.3 


g2032347 


393 


677 


18 


222429.3 


3553107H1 


395 


615 


18 


222429.3 


4309262H1 


423 


747 


18 


222429.3 


4721466T6 


428 


934 


18 


222429.3 


3466526H1 


435 


706 


18 


222429.3 


2316414H1 


439 


697 


18 


222429.3 


93087119 


484 


970 


18 


222429.3 


2859627H1 


491 


779 


18 


222429.3 


g3888504 


540 


971 


18 


222429.3 


1266732H1 


640 


774 


18 


222429.3 


93887405 


651 


869 


18 


222429.3 


93801166 


560 


970 


18 


222429.3 


759622R1 


567 


804 


18 


222429.3 


759622H1 


567 


920 


18 


222429.3 


91496653 


586 


967 


18 


222429.3 


05381 2H1 


589 


802 


18 


222429.3 


1822941 HI 


602 


829 


18 


222429.3 


93871768 


609 


967 


18 


222429.3 


93888431 


624 


972 


18 


222429.3 


5274484H1 


626 


878 


18 


222429.3 


93802660 


651 


970 


18 . 


222429.3 


1852753T6 


655 


1252 


18 


222429.3 


6015049H1 


672 


969 


18 


222429.3 


1430609H1 


686 


919 


18 


222429.3 


79861 5H1 


694 


975 


18 


222429.3 


3113611T6 


698 


1224 


18 


222429.3 


1685142H1 


742 


976 


18 


222429.3 


5900905H1 


743 


1032 


18 


222429.3 


2078759H1 


849 


1127 


18 


222429.3 


2043774H1 


851 


1124 


18 


222429.3 


92985167 


859 


1298 


18 


222429.3 


1402308H1 


861 


1118 


18 


222429.3 


9717242 


882 


971 


18 


222429.3 


94109637 


883 


1293 


18 


222429.3 


94110383 


891 


1293 


18 


222429.3 


93178557 


926 


1301 


18 


222429.3 


1463294H1 


928 


1142 


18 


222429.3 


1463335H1 


928 


1120 


18 


222429.3 


1463294T1 


928 


1248 


18 


222429.3 


93871712 


929 


1292 
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wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



SEQIDNO: 


Templcrt ID 


Component ID 


Start 


Stop 


18 


222429.3 


g4307962 


950 


1295 


18 


222429.3 


1940845T6 


953 


1251 


18 


222429.3 


1940845R6 


953 


1294 


18 


222429.3 


1940846H1 


953 


1181 


18 


222429.3 


3730073H1 


957 


1278 


18 


222429.3 


gl218940 


968 


1293 


18 


222429.3 


g2675322 


974 


1293 


18 


222429.3 


372621 4H1 


989 


1294 


18 


222429.3 


2872659H1 


994 


1288 


18 


222429.3 


2875466H1 


994 


1134 


18 


222429.3 


424731 1H1 


996 


1144 


18 


222429.3 


6007354H1 


1016 


1290 


18 


222429.3 


2154383H1 


1040 


1293 


18 


222429.3 


5680928H1 


1043 


1293 


18. 


222429,3 


2I4245H1 


1031 


1279 


18 


222429.3 


g1069690 


1091 


1208 


18 


222429.3 


92212472 


1098 


1350 


18 


222429.3 


839255H1 


1103 


1312 


18 


222429.3 


839255R1 


1103 


1293 


18 


222429.3 


3235786H1 


1143 


1252 


18 


222429.3 


891762R1 


1164 


1300 


18 


222429.3 


891762H1 


1164 


1300 


18 


222429.3 


1849630H1 


1184 


1293 


18 


222429.3 


1849630F6 


1184 


1284 


18* 


222429.3 


1849630T6 


1192 


1252 


18 


222429.3 


4865293H1 


1210 


1293 


18 


222429.3 


4721466F6 


1 


466 


18 


222429.3 


4721466H1 


1 


271 


18 


222429.3 


5638504H1 


65 


317 


18 


222429.3 


5638852H1 


65 


316 


18 


222429.3 


3076642H1 


228 


498 


18 


222429.3 


3076642F6 


228 


487 


18 


222429.3 


4940608H1 


247 


533 


18 


222429.3 


5388153H1 


267 


412 


18 


222429.3 


3750219H1 


253 


506 


18 


222429.3 


3937336H1 


253 


527 


18 


222429.3 


4122661 HI 


253 


500 


18 


222429.3 


3680329H1 


253 


672 


18 


222429.3 


4943143H1 


253 


516 


18 


222429.3 


458531 4H1 


254 


510 


18 


222429.3 


2732104H1 


254 


510 


18 


222429.3 


2537605H1 


257 


492 


18 


222429.3 


92015403 


274 


591 


18 


222429.3 


4786776H1 


274 


526 


18 


222429.3 


4045742H1 


274 


679 


18 


222429.3 


3567962H1 


274 


529 


18 


222429.3 


1386440H1 


276 


413 


18 


222429.3 


5187636H1 


275 


560 


18 


222429.3 


5197384H1 


277 


547 


18 


222429.3 


1572627H1 


277 


491 
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wo 00/73509 



PCT/USOO/15404 



TABLE 4 



SEQID NO: 


Template ID 


Component ID 


Start 


Stop 


18 


222429.3 


1572748H1 


277 


502 


18 


222429.3 


23721 13H1 


.279 


527 


18 


222429.3 


S187033H1 


279 


420 


18 


222429.3 


2785651 HI 


279 


563 


18 


222429.3 


2727323H1 


280 


534 


18 


222429.3 


3134034H1 


280 


554 


18 


222429.3 


927455H1 


280 


558 


18 


222429.3 


927455R1 


280 


894 


18 


222429.3 


2696333H1 


280 


528 


18 


222429.3 


23721 13F6 


280 


797 


18 


222429.3 


5847736H1 


281 


565 


18 


222429.3 


2853635H1 


281 


538 


18 


222429.3 


2854289H1 


281 


561 


18 


222429.3 


2730099H1 


281 


540 


18 


222429.3 


12681 78F1 


282 


704 


18 


222429.3 


2492274H1 


282 


517 


18 


222429.3 


4124267H1 


282 


513 


18 


222429.3 


2459370H1 


282 


511 


18 


222429.3 


1268178H1 


282 


556 


18 


222429.3 


6013641 HI 


282 


518 


18 


222429.3 


5378585H1 


283 


535 


18 


222429.3 


' 3985882H1 


274 


384 


18 


222429.3 


3870288H1 


286 


576 


18 


222429.3 


4608143H1 


283 


534 


18 


222429.3 


3983482H1 


286 


464 


18 


222429.3 


5843644H1 


286 


511 


18 


222429.3 


2605545H1 


288 


536 


18 


222429.3 


3088525H1 


288 


577 


18 


222429.3 


3218430H1 


290 


687 


18 


222429.3 


gl496652 


291 


538 


18 


222429.3 


4248167H1 


294 


568 


18 


222429.3 


4767534H1 


294 


424 


18 


222429.3 


4174335H1 


294 


613 


18 


222429.3 


3695889H1 


295 


582 


18 


222429.3 


5592876H1 


294 


443 


18 


222429.3 


2658609H1 


296 


546 


18 


222429.3 


730766H1 


296 


565 


18 


222429.3 


4613408H1 


297 


505 


18 


222429.3 


264145H1 


297 


616 


18 


222429.3 


5810372H1 


299 


666 


18 


222429.3 


3319788H1 


300 


476 


18 


222429.3 


4202422H1 


301 


596 


18 ■ 


222429.3 


4613487H1 


306 


607 


18 


222429.3 


g712550 


306 


583 


18 


222429.3 


5090055H1 


307 


567 


18 


222429.3 


2823320H1 


309 


631 


18 


222429.3 


2529472H1 


311 


578 


18 


222429.3 


48S0016H1 


312 


590 


18 


222429.3 


4606729H1 


313 


569 


18 


222429.3 


46081 74H1 


313 


567 
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wo 00/73509 



PCT/US0Qa5404 



TABLE 4 



SEQIDNQ. 


Template ID 


Component ID 


Start 


Stop 


18 


222429.3 


4334782H1 


314 


584 


18. 


222429.3 


3508043H1 


314 


609 


18 


222429.3 


4638791H1 


316 


578 


18 


222429.3 


2941851H1 


316 


694 


18 


222429.3 


2533245H1 


317 


556 


18 


222429.3 


033345H1 


317 


444 


18 


222429.3 


2888CI83H1 


317 


573 


18 


222429.3 


2766782H1 


318 


591 


18 


222429.3 


g 1069689 


321 


669 


18 


222429.3 


2514805H1 


321 


656 


19 


366739.2 


3793584H1 


1 


276 


19 


366739.2 


3093259F6 


1 


396 


19 


366739.2 


3093259H1 


1 


272 


19 


366739.2 


5686972H1 


28 


288 


19 


366739.2 


1562228H1 


28 


244 


19 


366739.2 


gl976820 


31 


385 


19 


366739.2 


5472436H1 


30 


254 


19 


366739.2 


gl 974601 


38 


290 


19 


366739.2 


3536311H1 


63 


278 


19 


366739.2 


gl301142 


203 


492 


19 


366739.2 


299601 9H1 


226 


495 


19 


366739.2 


1291844F6 


265 


754 


19 


366739.2 


1291844F1 


265 


757 


19 


366739.2 


1291844H1 


265 


495 


19 


366739.2 


6107426H1 


323 


395 


19 


366739.2 


6077830H1 


331 


653 


19 


366739.2 


2690675H1 


335 


549 


19 


366739.2 


3673120H1 


341 


654 


19 


366739.2 


4626753H1 


343 


626 


19 


366739.2 


2814563H1 


344 


559 


19 


366739.2 


5019593H1 


348 


607 


19 


366739.2 


5591354H1 


350 


602 


19 


366739.2 


4337887H1 


348 


633 


19 


366739.2 


2830045H1 


350 


618 


19 


366739.2 


3614922H1 


351 


637 


19 


366739.2 


3034106H1 


352 


640 


19 


366739.2 


4765938H1 


352 


635 


19 


366739.2 


3375151 HI 


354 


621 


19 


366739.2 


804576H1 


357 


612 


19 


366739.2 


2188984H1 


360 


622 


19 


366739.2 


15711 IHl 


361 


569 


19 


366739.2 


5391316H1 


370 


647 


19 


366739.2 


808513H1 


372 


654 


19 


366739.2 


1966930H1 


381 


617 


19 


366739.2 


1966930R6 


381 


755 


19 


366739,2 


5194739H1 


431 


658 


19 


366739.2 


4946548H1 


431 


604 


19 


366739.2 


4768533F6 


455 


865 


19^ 


366739.2 


4768533H1 


455 


739 


19 


366739.2 


g778805 


457 


743 
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wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



!lDNO: 


Template ID 


Compon ntID 


Start 


Stop 


19 


366739.2 


gl921131 


463 


977 


19 


366739.2 


4353206H2 


492 


757 


19 


366739.2 


5325529H1 


501 


794 


19 


366739.2 


5322286H1 


501 


774 


19 


366739.2 


5322867H1 


502 


752 


19 


366739.2 


869959H1 


564 


808 


19 


366739.2 


875716R6 


564 


1081 


19 


366739.2 


875716H1 


564 


855 


19 


366739.2 


875716R1 


564 


1203 


19 


366739.2 


32213S9H1 


601 


939 


19 


366739.2 


3869036H1 


630 


910 


19 


366739.2 


3353207H1 


645 


955 


19 


366739.2 


92016781 


644 


916 


19 


366739.2 


4587185H1 


645 


907 


19 


366739.2 


5406614H1 


653 


850 


19 


366739.2 


gl300764 


650 


1123 


19 


366739.2 


37041 7H1 


713 


1027 


19 


366739.2 


1265285H1 


720 


861 


19 


366739.2 


1265285R] 


720 


1186 


19 


366739.2 


1265102H1 


720 


891 


19 


366739.2 


3946741 HI 


742 


1024 


19 


366739.2 


3603130H1 


771 


1093 


19 


366739.2 


5472209H1 


846 


1043 


19 


366739.2 


3717241H1 


870 


1172 


19 


366739.2 


1291844T6 


871 


1416 


19 


366739.2 


2608644H1 


898 


1103 


19 


366739.2 


g922177 


903 


1148 


19 


366739.2 


251965H1 


925 


1273 


19 


366739.2 


3470188H1 


935 


1210 


19 


366739.2 


92838122 


936 


1451 


19 


366739.2 


1636138F6 


963 


1441 


19, 


366739.2 


1636138H1 


963 


1184 


19 


366739.2 


923492H1 


968 


1270 


19 


366739.2 


3811371H1 


977 


1294 


19 


366739.2 


93430481 


978 


1454 


19 


366739.2 


94282887 


982 


1458 


19 


366739.2 


94372283 


986 


1454 


19 


366739.2 


5138130H1 


1015 


1307 


19 


366739.2 


92958044 


1019 


1453 


19 


366739.2 


16361 38T6 


1035 


1416 


19 


366739.2 


92397938 


1039 


1452 


19 


366739.2 


1415729H1 


1048 


1289 


19 


366739.2 


676581H1 


1055 


1273 


19 


366739.2 


3093259T6 


1056 


1412 


19 


366739.2 


91921132 


1059 


1465 


19 


366739.2 


337785H1 


1075 


1288 


19 


366739.2 


3773169H1 


1088 


1367 


19 


366739.2 


92820846 


1095 


1453 


19 


366739.2 


92322961 


1109 


1457 


19 


366739.2 


93923575 


1124 


1453 
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wo 00/73509 



PCT/USOO/15404 



TABLE 4 



HDNO: 


Template ID 


Component ID 


Start 


Stop 


19 


366739.2 


5399095H1 


1135 


1273 


19 


366739.2 


gl266798 


1140 


1469 


19 


366739.2 


g2883870 


1143 


1434 


19 


366739.2 


94325983 


1152 


1468 


19 


366739.2 


94325893 


1154 


1468 


19 


366739.2 


1223335T1 


1154 


1414 


19 


366739.2 


1223335H1 


1164 


1434 


19 


366739.2 


93959938 


1177 


1463 


19 


366739.2 


91970868 


1179 


1468 


19 


366739.2 


2228180HI 


1186 


1460 


19 


366739.2 


92574692 


1223 


1462 


19 


366739.2 


2505795H1 


1237 


1466 


19 


366739.2 


92877025 


1242 


1453 


19 


366739.2 


94268440 


1269 


1453 


in 

1 T 


366739.2 


g2575541 


1264 


1453 


19 


366739.2 


9778806 


1268 


1452 


19 


366739.2 


1772864R6 


1273 


1453 


19 


366739.2 


1772849H1 


1273 


1453 


19 


366739.2 


91264282 


1343 


1473 


20 


474636.6 


340786H1 


2181 


2422 


20 


474635.6 


2242239H1 


2179 


2347 


20 


474635.6 


9991123 


2179 


2514 


20 


474635.6 


2234442H1 


2179 


2397 


20 


474635.6 


91056896 


2180 


2498 


20 


474635.6 


9959046 


2182 


2471 


20, 


474635.6 


91004715 


2183 


2627 


20 


474635.6 


4996329T6 


2189 


2626 


20 


474635.6 


47464^1 


2192 


2451 


20 


474635.6 


93737429 


2197 


2728 


20 


474635.6 


91039999 


2210 


2512 


20 


474636.6 


91081479 


2210 


2391 


20 


474635.6 


2204795T6 


2217 


2671 


20 


474635.6 


93736649 


2220 


2704 


20 


474635.6 


2242239T6 


2220 


2816 


20 


474635.6 


373391 8T6 


2230 


2786 


20 


474635.6 


93742588 


2229 


2728 


20 


474635.6 


91014336 


2236 


2566 


20 


474635.6 


3638137H1 


2230 


2511 


20 


474635.6 


9835640 


2232 


2567 


20 


474635.6 


1666516H1 


2236 


2449 


20 


474636.6 


2083270H1 


2262 


2667 


20 


474635.6 


1761002H1 


2284 


2681 


20 


474635.6 


612373H1 


2291 


2581 


20 


474635.6 


665014H1 


2291 


2558 


20 


474636.6 


9867547 


2291 


2662 


20 


474635.6 


91986029 


2312 


2772 


20 


474635.6 


3000193H1 


2318 


2627 


20 


474635.6 


461079H1 


2328 


2588 


20 


474635.6 


g 1274411 


2332 


2863 


20 


474635.6 


223701 9H1 


2332 


2557 



197 



wo 00/73509 



PCT/USOO/15404 



TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


20 


474635.6 


g866812 


2334 


2693 


20 


474635.6 


5275784H1 


2343 


2627 


20 


474635.6 


4085512H1 


2393 


2714 


20 


474635.6 


g1240371 


2407 


2873 


20 


474635.6 


g848488 


2408 


2767 


20 


474635.6 


g897948 


2408 


2722 


20 


474635.6 


g848612 


2408 


2850 


20 


474635.6 


gl444420 


2410 


2880 


20 


474635.6 


6651 69H1 


2419 


2677 


20 


474635.6 


2090332H1 


2439 


2728 


20 


474635.6 


gl004911 


2454 


2877 


20 


474635.6 


g2900276 


2464 


2950 


20 


474635.6 


g3848664 


2464 


2910 


20 


474635.6 


5432669H1 


2472 


2741 


20 


474635-6 


3134278H"! 


2486 


2792 


20 


474635.6 


g4 174578 


2487 


2999 


20 


474635.6 


g34331 1 1 


2489 


2991 


20 


474636.6 


g3419207 


2491 


3017 


20 


474636.6 


4087752H1 


2602 


2814 


20 


474635.6 


5327883H1 


2520 


2806 


20 


474636.6 


53261 14H1 


2623 


2818 


20 


474635.6 


5326214H1 


2524 


2806 


20 


474636.6 


g3229642 


2527 


3017 


20 


474635.6 


2757338F6 


2528 


3008 


20 


474635.6 


93405868 


938 


1282 


20 


474636.6 


572947H1 


944 


1190 


20 


474635.6 


3002964H1 


961 


1013 


20 


474635.6 


g982216 


976 


1338 


20 


474635.6 


2309094H1 


1029 


1217 


20 


474635.6 


4638168H1 


1036 


1287 


20 


474635.6 


1912857H1 


1042 


1285 


20 


474635.6 


g916500 


1047 


1247 


20 


474635.6 


3633880H1 


1055 


1354 


20 


474635.6 


3634680H1 


1055 


1325 


20 


474635.6 


4546166H1 


1074 


1351 


20 


474635.6 


g 1276377 


1073 


1326 


20 


474635.6 


5677925H1 


1154 


1382 


20 


474635.6 


4663039H1 


1163 


1425 


20 


474635.6 


35n 926Hl 


1174 


1426 


20 


474635.6 


4996329H1 


1324 


1605 


20 


474635.6 


4996329F6 


1324 


1683 


20 


474635,6 


4740909H1 


1330 


1603 


20 


474635.6 


64791 OH 1 


1344 


1623 


20 


474635.6 


1715955F6 


1431 


1896 


20 


474635.6 


3281456H1 


1463 


1723 


20 


474635.6 


2204796F6 


1494 


2015 


20 


474635.6 


2204795H1 


1494 


1739 


20 


474635.6 


4546440H1 


1504 


1781 


20 


474635.6 


16691 76H1 


1624 


1751 


20 


474636.6 


1443714F6 


1644 


2033 



198 



wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



SEQ ID NO" 


Tftmnlnto in 


^y^fY^^^^r^£^^+' \r\ 


OlvJil 


oTOp 


20 


474635 6 


i*t*Tv/ i*tn 1 


1644 


lOOO 
ItUU 


20 




1715055H1 


16R0 


lAOA 
IOtO 


20 


474635 6 


40^^7911-11 
*iuoo / ^ 1 n 1 




1 VOU 


20 


474635 6 




171ft 
1 / 10 


ivoy 


20 


474635 6 


3399A7ZH1 


1790 


10A7 
ItO/ 


20 


474635 6 


373*^01 APA 
o/ ooy 1 oro 


loUO 


zZ 10 


20 


AlAMf^ 6 

*♦/ •^UsJO.U 


**** 1 WJHHn 1 


1 A.'^9 


zUoo 


20 


474635 6 




lOOO 


zl44 


20 


474635 6 


5nQft379Hl 


1030 


ZIDU 


20 


474635 6 


^19rMAM1 


1033 


01 '^7 
Z 10/ 


20 


474635 6 


1965391D1 

1 ^\jO0£.01\ I 


103«\ 
ItOO 


z«loo 


20 






ItOO 


Z 101 


20 


474635 6 




10A1 


z lOi 


20 


474635 6 




lOAl 


Z I**/ 


20 


474635 6 


1631Q3QH1 


10A1 


91<>1 


20 


474635.6 


61095S0H1 


10AA 


Z lOI 


20 


474635 6 


u 1 UiiOvXjn 1 


10AA 


Z 104 


20 


474635.6 


nQ5Aft16 

^ T woo 1 \J 


9179 


9'^7A 
ZO/O 


20 


474635 6 


n 1047675 


9173 


ZOUO 


20 


474635 6 


a96597Q 


9173 


ZOOO 


20 


474635.6 


03098969 


9173 


9*^10 
Zu lU 


20 


474635.6 


al202194 

M 1 1 7— T 


2174 


959R 
ZwZO 


20 


474635.6 


n 1099631 


9174 


95A7 
zoo/ 


20 


474635.6 


a 1047669 


9174 


9*\14 
ZO 1 4 


20 


474635 6 


nl99R900 


' 91 7A 


ZOUO 


20 


4746356 


nl 1035A3 


91 7A 


O^A/I 
Z404 


20 


474635 6 


n9919515 


91 7A 


OAA7 
ZOO/ 


20 


474635 6 


99499 '^OPA 


91 7A 
Z 1 /O 


OA>IA 
Z040 


20 


4746356 


ol91 1570 


91 7A 
Z 1 /O 


94A7 


20 


474635.6 


n 1995734 


9176 

Z 1 f D 


0401 

Z*IT 1 


20 


474635 6 


n 1996785 
y 1 ^^yji Ow 


9176 
Z 1 /o 


94A0 
Z40Z 


20 


474635.6 


o 1043793 


91 76 
z 1 / V/ 


9*^09 
ZOtZ 


20 


474635.6 


□1162551 


9176 


ZOUO 


20 


4746356 


1725002H1 


1 
1 


907 
zu/ 


20 


474635.6 


1726247H1 


1 


91A 
Z lO 


20 


4746356 


3495066H1 


1 
1 


301 
OU 1 


20 


474635 6 


1797369F6 


1 


3A7 
00/ 


20 


474635,6 


1727369H1 


1 


940 
Zht 


20 


474635 6 


3109953H1 


10 


OUO 


20 


474635 6 


236951 7H1 


16 


07O 
Z/t 


20 


474635 6 


2361835H1 
\ o\/wn 1 


16 


970 
Z/U 


20 


474635.6 


3617437H1 


58 


382 


20 


474635.6 


2659652H1 


91 


348 


20 


474635.6 


37341 89H1 


92 


393 


20 


474635.6 


2679260H1 


96 


409 


20 


474635.6 


1466462H1 


96 


295 


20 


474635.6 


3563056H1 


96 


415 


20 


474635.6 


2235355H1 


100 


364 


20 


474635.6 


2202076H1 


99 


367 


20 


474635.6 


3129502H1 


101 


436 



199 



WO0QA73509 



PCT/USOO/15404 



TABLE 4 



SEQIDNO: 


T mplatelO 


Component ID 


Start 


Stop 


20 


474635.6 


2051824H1 


118 


293 


20 


474635.6 


3633032H1 


122 


294 


20 


474636.6 


3160266H1 


126 


379 


20 


474635.6 


2051825H1 


132 


414 


20 


474636.6 


gl227685 


128 


482 


20 


474635.6 


428881 7H1 


146 


437 


20 


474636.6 


4860794H1 


155 


246 


20 


474635.6 


174879H1 


161 


364 


20 


474636.6 


5700836H1 


163 


471 


20 


474635.6 


5539631 H2 


182 


413 


20 


474635.6 


5206046H1 


220 


491 


20 


474635.6 


37591 1R6 


244 


666 


20 


474636.6 


376911 HI 


244 


536 


20 


474636.6 


g657023 


256 


600 


20 


474636.6 


3939767H1 


276 


433 


20 


474636.6 


3939749H1 


276 


433 


20 


474636.6 


g704964 


290 


689 


20 


474635.6 


91013184 


290 


695 


20 


474636.6 


gl 472478 


326 


796 


20 


474636.6 


1220725H1 


365 


600 


20 


474635.6 


4980839H1 


368 


659 


20 


474635.6 


37591 1T6 


370 


902 


20 


474635.6 


3675542H1 


371 


682 


20 


474635.6 


3669542H1 


371 


695 


20 


474636.6 


3671542H1 


371 


558 


20 


474635.6 


gl 44791 3 


383 


835 


20 


474636.6 


12821 81 HI 


391 


534 


20 


474636.6 


g2020599 


418 


783 


20 


474635.6 


4133026H2 


447 


725 


20 


474636.6 


2236848H1 


460 


698 


20 


474636.6 


g1278197 


473 


1017 


20 


474635.6 


gl 941 907 


514 


972 


20 


474635.6 


g2695124 


514 


1013 


20 


474635.6 


1470350F6 


554 


998 


20 


474635.6 


578500H1 


567 


768 


20 


474635.6 


g4371917 


558 


1013 


20 


474635.6 


g1447816 


668 


1013 


20 


474636.6 


g2702699 


559 


1013 


20 


474635.6 


505476H1 


560 


793 


20 


474635.6 


2086083H1 


579 


884 


20 


474636.6 


g3430649 


602 


1020 


20 


474635.6 


g1671099 


609 


1029 


20 


474635.6 


g 1472421 


658 


1013 


20 


474635.6 


g656867 


700 


1013 


20 


474636.6 


3381660H1 


716 


902 


20 


474635.6 


4270822H1 


742 


1016 


20 


474636.6 


gl 977606 


753 


1013 


20 


474635.6 


459461 2H1 


765 


1013 


20 


474635.6 


9944107 


761 


1016 


20 


474636.6 


1470350H1 


802 


998 



200 



wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



3 ID NO: 


Template ID 


Component ID 


Start 


Stop 


20 


474635.6 


236251 7T6 


802 


970 


20 


474635.6 


gl011912 


812 


1004 


20 


474635.6 


gl201979 


871 


1013 


20 


474636.6 


4412657H1 


883 


969 


20 


474635.6 


9916501 


928 


1234 


20 


474635.6 


gl238^8 


931 


1126 


20 


474635.6 


g2057226 


934 


1034 


20 


474635.6 


3450474H1 


938 


1128 


20 


474635.6 


92022938 


938 


1147 


20 


474635.6 


9982171 


939 


1266 


20 


474635.6 


2757338H1 


2528 


2825 


20 


474635.6 


2757338R6 


2528 


2995 


20 


474635.6 


5425690H1 


2531 


2796 


20 


474635.6 


g1046524 


2536 


2900 


20 


474635.6 


g3134715 


2534 


2985 


20 


474635.6 


3801 76H1 


2537 


2654 


20 


474636.6 


91186355 


2660 


2772 


20 


474635.6 


g856701 


2582 


2772 


20 


474635.6 


1795930H1 


2683 


2701 


20 


474635.6 


g651862 


2589 


2849 


20 


474635.6 


9651879 


2589 


2896 


20 


474635.6 


93076096 


2594 


2992 


20 


474635.6 


94078849 


2605 


2999 


20 


474636.6 


9835694 


2606 


2985 


20. 


474635.6 


gl081771 


2618 


2893 


20 


474635.6 


91266372 


2625 


2900 


20 


474635.6 


971739H1 


2631 


29^ 


20 


474635.6 


1808492H1 


2639 


2869 


20 


474635.6 


649261 HI 


2645 


2940 


20 


474635.6 


9867525 


2662 


2949 


20 


474636.6 


1352072H1 


2682 


2960 


20 


474635.6 


9848398 


2688 


2993 


20 


474635.6 


93754493 


2700 


2999 


20 


474636.6 


92839098 


2719 


3187 


20 


474635.6 


g848419 


2728 


3010 


20 


474635.6 


91524564 


2740 


3015 


20 


474635.6 


9806064 


2765 


2972 


20 


474635.6 


1402667H1 


2791 


3002 


20 


474636.6 


g2752910 


2841 


2938 


20 


474635.6 


g715966 


2849 


2968 


20 


474635.6 


93428030 


2948 


2997 


24 


41 1449.2 


3292975H1 


664 


917 


24 


41 1449.2 


91277242 


663 


1052 


24 


41 1449.2 


1365750H1 


662 


944 


24 


411449.2 


1355760F6 


662 


1150 


24 


411449.2 


91443735 


707 


1043 


24 


411449.2 


1984010H1 


722 


990 


24 


41 1449.2 


g1243124 


747 


1012 


24 


41 1449.2 


91243130 


747 


939 


24 


41 1449.2 


91243104 


748 


938 



201 



wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



SEQ ID NO: 


Templat ID 


Component ID 


Start 


Stop 


24 


411449.2 


gl243131 


773 


923 


24 


41 1449.2 


0841841 


823 


1176 


24 


411449.2 


91635822 


830 


1026 


24 


411449.2 


677686H1 


890 


1140 


24 


411449.2 


289661 9H1 


891 


1077 


24 


411449.2 


93933055 


644 


1016 


24 


411449.2 


1689376H1 


898 


1105 


24 


411449.2 


37831 11 HI 


908 


1221 


24 


411449.2 


4796764H1 


515 


795 


24 


411449.2 


4796772H1 


515 


795 


24 


411449.2 


92037311 


533 


814 


24 


411449.2 


92819773 


551 


815 


24 


411449.2 


057539H1 


607 


708 


24 


411449.2 


93096856 


660 


922 


24 


411449,2 


91125331 


560 


960 


24 


411449.2 


91386245 


683 


970 


24 


411449.2 


3323088H1 


586 


854 


24 


41 1449.2 


92816548 


607 


1022 


24 


41 1449.2 


3660962H1 


612 


867 


24 


411449.2 


91887649 


616 


946 


24 


411449.2 


91489541 


618 


866 


24 


411449.2 


91489523 


619 


965 


24 


411449.2 


94327095 


909 


1337 


24 


411449.2 


5275975H1 . 


911 


1077 


24 


411449.2 


2682468H1 


941 


1157 


24 


411449.2 


2682429H1 


943 


1218 


24 


411449.2 


3448516T6 


966 


1496 


24 


411449.2 


5517642H1 


966 


1232 


24 


41 1449.2 


3723582H1 


970 


1267 


24 


41 1449.2 


589712H1 


978 


1222 


24 


411449.2 


589712R1 


978 


1519 


24 


41 1449.2 


2613128H1 


1015 


1252 


24 


41 1449,2 


1355750T6 


1052 


1482 


24 


41 1449,2 


3820979H1 


1057 


1340 


24 


411449.2 


1811236H1 


1068 


1326 


24 


411449.2 


1811236F6 


1068 


1562 


24 


411449.2 


1811236T6 


1083 


1726 


24 


41 1449.2 


94033830 


1106 


1519 


24 


411449.2 


93917076 


1106 


1619 


24 


41 1449,2 


93917072 


1108 


1519 


24 


41 1449.2 


3297093H1 


1117 


1204 


24 


411449.2 


4513565H1 


1174 


1440 


24 


411449.2 


479851 2H1 


1190 


1460 


24 


411449.2 


26451 lOHl 


1195 


1450 


24 


411449.2 


2096680R6 


1216 


1644 


24 


411449.2 


2096680H1 


1216 


1462 


24 


411449,2 


321613H1 


1233 


1492 


24 


411449.2 


2995437H1 


1251 


1638 


24 


411449.2 


94371719 


1378 


1775 


24 


41 1449.2 


2096680T6 


1380 


1715 



202 



wo 00/73509 PCT/USOO/15404 



TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


24 


41 1449.2 


4653462H1 


1411 


1685 


24 


411449,2 


g 1489542 


1416 


1765 


24 


411449.2 


4144044H1 


1422 


1709 


24 


411449.2 


gl 224160 


1425 


1770 


24 


411449.2 


gl224144 


1433 


1770 


24 


411449.2 


g 1489624 


1473 


1766 


24 


41 1449.2 


g769281 


1485 


1823 


24 


41 1449.2 


g2955239 


1501 


1777 


24 


411449.2 


g4078417 


1503 


1770 


24 


411449.2 


gl224166 


1517 


1770 


24 


411449.2 


gl224143 


1529 


1770 


24 


411449.2 


1972386H1 


1531 


1804 


24 


411449.2 


g4084798 


1534 


1954 


24 


411449.2 


gl224165 


1560 


1770 




41 1449.2 


rt ^ 


1564 


1798 


24 


411449.2 


g2657184 


1564 


1868 


24 


411449.2 


5919979H1 


1256 


1569 


24 


411449.2 


008248H1 


1258 


1569 


24 


411449.2 


g889333 


1271 


1666 


24 


411449.2 


gl 139751 


1287 


1746 


24 


41 1449.2 


589712F1 


1305 


1770 


24 


41 1449.2 


3659604H1 


1316 


1588 


24 


41 1449.2 


2598057H1 


1327 


1458 


24 


41 1449.2 


g31 78153 


1345 


1772 


24 


411449.2 


g4187548 


1373 


1774 


24 


41 1449.2 


g574838 


1617 


1824 


24 


41 1449.2 


3778628H1 


1645 


1948 


24 


411449.2 


g3785651 


1656 


1770 


24 


411449.2 


763788H1 


1696 


1765 


24 


411449.2 


1490003H1 


1788 


2054 


24 


411449.2 


g2000780 


1844 


2185 


24 


411449.2 


4704775H1 


1862 


2105 


24 


411449.2 


g3921576 


1873 


2274 


24 


411449.2 


g3231159 


1901 


2276 


24 


41 1449.2 


g2222962 


1902 


2260 


24 


411449.2 


g2222973 


1902 


2258 


24 


411449.2 


638241 HI 


1908 


2160 


24 


41 1449.2 


001302H1 


1908 


2236 


24 


411449.2 


1625423F6 


1911 


2197 


24 


411449.2 


1625423H1 


1911 


2113 


24 


41 1449.2 


491366H1 


1923 


2185 


24 


41 1449.2 


gl964447 


1927 


2193 


24 


411449.2 


g 1963729 


1927 


2263 


24 


41 1449.2 


g 1379590 


1936 


2272 


24 


411449.2 


gl472323 


1948 


2197 


24 


411449.2 


2399751 HI 


1951 


2192 


24 


411449.2 


4856606H1 


1973 


2197 


24 


411449.2 


995171m 


1978 


2209 


24 


411449.2 


gl472316 


1979 


2197 


24 


411449.2 


5194364H2 


1982 


2194 



203 



wo 00/73509 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


24 


41 1449.2 


3451002H1 


2043 


2168 


24 


411449.2 


g4624789 


2083 


2553 


24 


411449.2 


g2159625 


2209 


2405 


24 


411449.2 


g21 54000 


2210 


2639 


24 


411449.2 


93739242 


2216 


2561 


24 


411449.2 


5276078H1 


2220 


2494 


24 


411449.2 


g889241 


2223 


2564 


24 


411449.2 


91148058 


2235 


2554 


24 


41 1449.2 


g3077322 


2268 


2561 


24 


411449.2 


9784419 


2295 


2555 


24 


411449.2 


2758260H1 


2312 


2541 


24 


411449.2 


g41 09094 


2349 


2619 


24 


41 1449.2 


91785173 


2442 


2554 


24 


411449.2 


3504710H1 


2444 


2554 


24 


41 1449.2 


93678027 


2544 


2899 


24 


411449.2 


93412805 


2545 


2944 


24 


411449.2 


93214749 


2546 


2950 


24 


411449.2 


g4149617 


2546 


2995 


24 


411449.2 


92969810 


2546 


2703 


24 


411449.2 


9746614 


2551 


2916 


24 


411449.2 


93756632 


2550 


2615 


24 


41 1449,2 


92969693 


2551 


2970 


24 


411449.2 


g3770455 


2552 


2832 


24 


411449.2 


92538262 


2552 


2618 


24 


41 1449.2 


92882695 


2552 


2871 


24 


411449.2 


3073355H1 


3031 


3337 


24 


41 1449.2 


2502022H1 


3034 


3299 


24 


411449.2 


2203860H1 


3035 


3321 


24 


411449.2 


2415335H1 


3040 


3241 


24 


411449.2 


4068029H1 


3043 


3337 


24 


411449.2 


3633303H1 


3044 


3319 


24 


411449.2 


1972353H1 


3039 


3301 


24 


411449.2 


9746725 


3044 


3326 


24 


411449.2 


3737404H1 


3044 


3324 


24 


411449.2 


3557887H1 


3044 


3317 


24 


411449.2 


3295556H1 


3046 


3325 


24 


411449.2 


4114476H1 


3047 


3337 


24 


411449.2 


4795315H1 


3048 


3337 


24 


411449.2 


577995H1 


3047 


3307 


24 


41 1449.2 


3167827H1 


3051 


3339 


24 


41 1449.2 


5686358H1 


3055 


3339 


24 


41 1449.2 


2781948H1 


3066 


3331 


24 


411449.2 


2561784H2 


3068 


3340 


24 


41 1449.2 


4712882H1 


3077 


3339 


24 


411449.2 


5374882H1 


3086 


3335 


24 


411449.2 


g21 53894 


3087 


3334 


24 


41 1449.2 


3596863H1 


3133 


3337 


24 


411449.2 


4675071 HI 


3155 


3312 


24 


411449.2 


5285819H1 


3161 


3337 


24 


41 1449.2 


3638749H1 


3186 


3301 
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TABLE 4 





lempiuTe lu 


^OmponenT \\J 


oTarr 


OTOp 


OA 

24 


X 1 1 X At\ O 

41 1449.2 


27074oOHl 


3198 


1 o 

3319 


OA 


>l 1 1 yivio o 
4 1 1 QDH*£. 


OoyD^4Un 1 


*xonn 


OOO/ 


OA 


ill 1 xxo o 
4 1 1 44y.2 


oUo9/ 1 on 1 


o2ol 


I'll A 

oo 16 


Oil 

24 


il 1 1 il ilO o 

41 1449.2 


^ 1 TO 1 TOO 

gl781709 


1 
1 


493 


OA 


XII xxn o 
41 1449.2 


gl775ooo 


1 


xoo 

432 


24 


X 1 1 xxo o 
41 1449.2 


OX70107CA 

24/2307rO 


1 Q 

1o 


OOT 

397 


Oil 

24 


411449.2 


Oil*7O0O*71 11 

2472307H1 


18 


265 


Oil 

24 


ill 1 iixn o 

41 1449.2 


aOooi xo 

g822142 


36 


381 


OA 

24 


XII xxo o 

41 1449.2 


^oooo*no 

g2u00779 


51 


oco 

258 


24 


411449.2 


2782774H1 


79 


356 


Oil 

24 


XII xxo o 

41 1449.2 


g 1295851 


138 


o to 

810 


Oil 

24 


X 1 1 X xo o 

41 1449.2 


m1 O 1 0400 

gl 31 2393 


1 

138 


614 


O A 

24 


41 1449.2 


*)iliiOCl 

344861 6R6 


142 


624 


OA 


X 1 1 X AC\ O 

41 1449.2 


« 1 oi ococ 

glOlDoOo 


occo 

2552 


0"71 X 

2714 


OA 


ill 1 AACi O 


g] 69201 1 


2553 


oocx 


OA 
Z4 


XII xxn o 
41 1449.2 


127470Url 


200D 


OQXQ 

2o4o 


Oil 

24 


il 1 1 il ilO o 

41 1449.2 


^OXCO^Oil 

g2668624 


OCC il 

2554 


00*9 A 

2974 


OA 

24 


XII xxn o 
4 1 1 449.2 


^1 OOl XAX 

gl9214o4 


occx 
2554 


OOAQ 

2863 


Oil 

24 


XII xxn o 

41 1449.2 


m1 1 CO 1 O X 

gl 153184 


occx 

2554 


OTZ. X 

2764 


24 


X 1 1 X xo o 
41 1449.2 


g2768980 


occc 

2555 


oxxo 

2668 


OA 


41 1449.2 


-,Ol CQ7Q7 

g2 loo/o/ 


2000 


0070 

29/2 


OA 

24 


XI 1 ilXO o 

41 1449.2 


Qo 1 1 /4U0 


occc 
2O0O 


ODCO 


OA 

24 


X 1 1 X xo o 

41 1449.2 


g2933847 


occx 

2554 


oooc 

2995 


OA 

24 


X 1 1 xxo o 
41 1449.2 


g29o4 1 00 


occc 
2000 


3034 


24 


XII xxo o 
41 1449.2 


g2ooo9oo 


2556 


OQAQ 

2868 


ox 
24 


XII xxn o 
41 1449.2 


oo T QO ^ ou ^ 
201o010nl 


OCAO 

2562 


OQXO 

2849 


OA 

24 


X 1 1 X xo o 
41 1449.2 


_-l XOiLXOC 

giooo4oo 


2656 


070A 

2796 


24 


il 1 1 X xo o 

41 1449.2 


OOCOOl XLJl 

20500 14H1 


2689 


ooxo 

2969 


Oil 

24 


ill 1 XilO o 

41 1449.2 


OOlOOl TUl 

291021 7H1 


OXOil 

2694 


OOAT 

2967 


Oil 

24 


XI 1 X xo o 

411449.2 


aO 1 c oooo 

g2 153999 


OTOO 

2709 


3188 


ox 
24 


ill 1 il ilO o 

41 1449.2 


JK.OOO il D il O 

g2204848 


OTTO 

2778 


n 1 oo 

3188 


ox 

24 


ill 1 il xo o 

411449.2 


Oinxoooui 

2476089111 


0*TOO 

2798 


3037 


OA 


XII xxo o 
41 1449.2 


g 1925326 


OQX7 

2847 


3323 


ox 

24 


X 1 1 X xo o 

41 1449.2 


M 1 XTO 1 oc 

gl 6781 25 


OOl o 

2913 


3317 


ox 
24 


xi 1 xxo o 

41 1449.2 


g2558304 


2937 


0101 

3101 


OA 

24 


XI 1 >IXO o 

41 1449.2 


^omoi QO 
g2IUU1o2 


OOC1 

2951 


ooo4 


OA 
Z4 


XII xxo 0 




0071 

29/ 1 


Olo2 


OA 

24 


XII xxo o 
41 1449.2 


Q C 1 Q O OOU 1 

00 1 0029n 1 


007A 

29/6 


OOl 1 


OA 
^4 


ill lilXO o 
41 i44y.z 


^X79XCAIJ1 

04/o400n 1 


0Q7O 

2V/9 


OOO/ 




XI ixxo 0 


1 AOA A7 1 M 1 


OOAA 
ZVOO 


o^ou 


ox 
24 


XII xxo o 
41 1449.2 


XCl OOit7LJ1 

4o 1 296/n 1 


OOQ7 

2987 


ooco 

3259 


24 


411449.2 


gl692106 


2999 


3198 


24 


411449.2 


3081295H1 


3009 


3317 


24 


411449.2 


4819058H1 


3020 


3324 


24 


411449.2 


3492756H1 


3028 


3337 


24 


411449.2 


3448616H1 


142 


413 


24 


411449.2 


6138342H1 


150 


451 


24 


411449.2 


1957671H1 


154 


438 


24 


411449.2 


g3888560 


166 


634 


24 


411449.2 


239023H1 


199 


433 



205 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


24 


411449.2 


463172H1 


205 


453 


24 


411449.2 


gl471862 


257 


732 


Oil 

24 


411449.2 


gl471856 


257 


691 


24 


411449.2 


5153884H1 


352 


608 


24 


411449.2 


593640H1 


385 


596 


24 


411449.2 


g2037867 


408 


741 


24 


411449.2 


g2946071 


438 


791 


25 


18549.2 


2666675H1 


1 


256 


25 


18549.2 


2666675F6 


1 


447 


25 


18549.2 


2872056H1 


35 


324 


25 


18549.2 


3674803H1 


45 


340 


25 


18549.2 


g4393247 


48 


393 


26 


236043.3 


4854788H1 


57 


322 


26 


236043.3 


2637174H1 


57 


313 


26 


236043.3 


4154G39H1 


57 


326 


26 


236043.3 


4154437H1 


66 


332 


26 


236043.3 


g 1784274 


72 


327 


26 


236043.3 


g 1782229 


74 


421 


26 


236043.3 


51 67751 HI 


1 


259 


26 


no An o 

236043.3 


4154654H1 


31 


288 


26 


no ^ n J o o 

236043.3 


2637174F6 


57 


407 


26 


236043.3 


^0^4 »\ t% 

5856494H1 


52 


322 


26 


236043.3 


753667H1 


57 


31 1 


26 


236043.3 


2638555P6 


A 1 C 

615 


Ol o 

912 


26 


236043.3 


OZ.40Bf PTZ. 

2638655T6 


624 


1270 


26 


oo ^ n ii o o 

236043.3 


4000427H1 


750 


952 


26 


fsf\ t.r\ A'\ o 

236043.3 


118534H1 


826 


1086 


oz. 

26 


OOXOZ4 O 

236043.3 


gl782014 


858 


1286 


26 


236043.3 


g2553565 


891 


1313 


26 


236043.3 


g707651 


915 


1263 


26 


/so ^ #s ^ o o 

236043,3 


506959H1 


946 


1251 


26 


236043.3 


gl792l56 


953 


1283 


26 


236043.3 


g2616478 


957 


1266 


26 


236043.3 


g 1792664 


975 


1281 


Oi 

26 


236043.3 


^0">C ilAAO 

g3754889 


1007 


1313 


26 


236043.3 


g707650 


1011 


1322 


OA 

26 


OOZ.O>IO o 

236043.3 


g 1784007 


1062 


1313 


OA 

26 


236043.3 


g 176201 8 


1086 


1313 


OA 

26 


236043.3 


1209542HI 


1096 


1 OOA 

1326 


26 


no / n J o o 

236043.3 


niiooiiioi 11 

24334 13H1 


1134 


1379 


26 


236043.3 


g 1792600 


1184 


1283 


26 


236043.3 


4401019T6 


1210 


1699 


26 


236043.3 


2637174T6 


1202 


1260 


26 


236043.3 


g2342125 


1329 


1530 


26 


236043.3 


118858H1 


75 


275 


26 


236043.3 


gl 760400 


77 


392 


26 


236043.3 


gl664195 


79 


446 


26 


236043.3 


gl784031 


82 


542 


26 


236043.3 


62981 lOHl 


93 


360 


26 


236043.3 


5280294H1 


110 


358 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


26 


236043.3 


3516062H1 


113 


391 


26 


236043.3 


4012792H1 


114 


402 


26 


236043.3 


5856458H1 


176 


453 


26 


236043.3 


5856757H1 


176 


310 


26 


236043.3 


g 1784279 


204 


475 


26 


236043.3 


9707758 


277 


621 


26 


236043.3 


4150892H1 


309 


572 


26 


236043.3 


91792258 


414 


803 


26 


236043.3 


3420770H1 


432 


622 


26 


236043.3 


2638855H1 


615 


846 


26 


236043.3 


121222H1 


615 


730 


27 


446433.2 


3287379H1 


1 


244 


27 


445433.2 


2275475H1 


193 


424 


27 


445433.2 


3967290T6 


308 


705 


27 


J Jiff *i« /^ 


3967290F6 


315 


723 


27 


445433.2 


4133307H1 


315 


574 


27 


445433.2 


3967290H1 


315 


558 


27 


445433.2 


1343762F6 


681 


1042 


29 


257121.2 


2808987H1 


805 


1039 


29 


257121.2 


4021166H1 


831 


1064 


29 


257121.2 


91720349 


2858 


3363 


29 


257121.2 


91522235 


2858 


3196 


29 


257121.2 


3517880H1 


2876 


3155 


29 


257121.2 


4459330H1 


2876 


3127 


29 


257121.2 


3917168H1 


2886 


3150 


29 


257121.2 


3160987H1 


2897 


3193 


29 


257121.2 


4506429H1 


2901 


2981 


29 


257121.2 


4624935H1 


2909 


3181 


29 


257121.2 


879657H1 


3382 


3640 


29 


257121.2 


434236H1 


3386 


3610 


29 


257121.2 


2078370H1 


2936 


3207 


29 ' 


257121.2 


5058836H1 


2958 


3241 


29 


257121.2 


1955941 HI 


2984 


3250 


29 


257121.2 


3993470H1 


3016 


3322 


29 


257121.2 


2693754H1 


3017 


3296 


29 


267121.2 


3528783H1 


3020 


3325 


29 


257121.2 


2945721 HI 


3030 


3196 


29 


257121.2 


2947643H1 


3028 


3355 


29 


257121.2 


5271360H1 


3029 


3276 


29 


257121.2 


93840958 


3443 


3849 


29 


257121.2 


406453H1 


3445 


3689 


29 


257121.2 


865275H1 


3451 


3684 


29 


257121.2 


2373951 HI 


3456 


3695 


29 


257121.2 


92329311 


3465 


3849 


29 


257121.2 


5201726T6 


3463 


3823 


29 


257121.2 


3929807H1 


3466 


3775 


29 


257121.2 


4466537H1 


3466 


3723 


29 


257121.2 


2071630H1 


2781 


3036 


29 


257121.2 


2714703H1 


2800 


3056 


29 


257121.2 


630218H1 


2804 


3049 
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ID NO: 


Template ID 


Component ID 


Start 


Stop 


29 


257121.2 


92219708 


2818 


3192 


29 


267121.2 


5435973H1 


2403 


2648 


29 


257121.2 


g21 89770 


2409 


2575 


29 


257121.2 


3091740F6 


2434 


2892 


29 


257121.2 


3091740H1 


2436 


2721 


29 


257121.2 


4931376H1 


2460 


2729 


29 


257121.2 


5086828H1 


2466 


2717 


29 


257121.2 


4706266H1 


2474 


2701 


29 


257121.2 


4623814H1 


2489 


2746 


29 


257121.2 


1711729H1 


2493 


2682 


29 


257121.2 


3808352H1 


2493 


2707 


29 


257121.2 


gl999011 


2496 


2841 


29 


257121.2 


2080982H1 


2600 


2772 


29 


257121.2 


4936350H1 


2607 


2800 


29 


257121.2 


2473816H1 


2634 


2761 


29 


257121.2 


2473816F6 


2534 


3077 


29 


257121.2 


2228396H1 


2540 


2791 


29 


257121.2 


4021166F6 


831 


1284 


29 


257121.2 


2051255H1 


838 


1128 


29 


267121.2 


3001820F6 


870 


1041 


29 


257121.2 


3001820H1 


871 


1152 


29 


267121.2 


53951 59H1 


935 


1013 


29 


257121.2 


492727H1 


1100 


1333 


29 


267121.2 


2738108H1 


1122 


1378 


29 


257121.2 


3344641 HI 


1132 


1370 


29 


257121.2 


526778H1 


1241 


1504 


29 


257121.2 


310738H1 


1296 


1527 


29 


257121.2 


2754594H1 


1296 


1666 


29 


257121.2 


4055308H1 


3034 


3328 


29 


257121.2 


4021166T6 


3036 


3658 


29 


257121.2 


4466912H1 


3059 


3327 


29 


267121.2 


6586646H1 


3067 


3312 


29 


267121.2 


gl 239705 


3068 


3332 


29 


257121.2 


552938H1 


3096 


3365 


29 


257121.2 


3730966H1 


3111 


3436 


29 


257121.2 


gl444147 


3121 


3612 


29 


257121.2 


5550666H1 


3172 


3413 


29 


257121.2 


5507424H1 


1840 


2028 


29 


257121.2 


5508406H1 


1840 


2087 


29 


257121.2 


3G00303H1 


1849 


2065 




Zt>/ \£.\.£. 








29 


257121.2 


5662088H1 


1864 


2082 


29- 


257121.2 


g3770956 


1868 


2224 


29 


257121.2 


g3802746 


1899 


2216 


29 


257121.2 


4892961H1 


1903 


2189 


29 


257121.2 


3479021 HI 


1924 


2149 


29 


257121.2 


1690528H1 


1938 


2168 


29 


257121.2 


4467223H1 


1963 


2169 


29 


257121.2 


5270694H1 


3395 


3651 


29 


257121.2 


gl 719483 


3568 


3855 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


29 


257121.2 


g4531585 


3586 


3849 


29 


257121.2 


3720345H1 


3594 


3759 


29 


257121.2 


2569980H1 


3601 


3849 


29 


257121.2 


gl203034 


3610 


3783 


29 


257121.2 


94450994 


3614 


3849 


29 


257121.2 


4901641H1 


3614 


3822 


29 


257121.2 


4960977H1 


2828 


3096 


29 


257121.2 


g2728590 


2834 


3223 


29 


257121.2 


3162492H1 


2842 


3125 


29 


257121.2 


gl719482 


2858 


3289 


29 


257121.2 


92211339 


3417 


3848 


29 


257121.2 


92659436 


3422 


3854 


29 


257121.2 


94296041 


3428 


3849 


29 


257121.2 


5921004H1 


3429 


3679 


29 


257121.2 


2962704H1 


3431 


3741 


29 


257121.2 


5922274H1 


3431 


3715 


29 


257121.2 


92329245 


3433 


3703 


29 


257121.2 


94328182 


3435 


3849 


29 


257121.2 


1994936R6 


3438 


3848 


29 . 


257121.2 


1994936T6 


3438 


3808 


29 


257121.2 


1994936H1 


3438 


3705 


29 


257121.2 


865275T1 


3439 


3794 


29 


257121.2 


453069H1 


1980 


2189 


29 


257121.2 


5982875H1 


2023 


2298 


29 


257121.2 


92457731 


2054 


2450 


29 


257121.2 


2883663H1 


2077 


2328 


29 


257121.2 


1786163H1 


2091 


2273 


29 


257121.2 


3433528H1 


2097 


2303 


29 


257121.2 


2588747H2 


2117 


2375 


29 


257121.2 


4715351H1 


2157 


2241 


.29 


257121.2 


4691328H1 


2198 


2462 


29 


257121.2 


2904069H1 


2226 


2537 


29 


257121.2 


4300967H1 


2236 


2508 


29 


257121.2 


462491 8H1 


2909 


3183 


29 


257121.2 


2374252H1 


2910 


3120 


29 


257121.2 


4909794H1 


2922 


3203 


29 


257121.2 


1741105H1 


2925 


3159 


29 


257121.2 


1741564H1 


2925 


3086 


29 


257121.2 


1741564R6 


2925 


3384 


29 


257121.2 


9531805 


3303 


3701 


29 


257121.2 


3091740T6 


3306 


3809 


29 


257121.2 


2697055H1 


3308 


3597 


29 


257121.2 


1672059T6 


3344 


3809 


29 


257121.2 


92818402 


3359 


3858 


29 


257121.2 


24231 36H1 


3377 


3613 


29 


257121.2 


879657T1 


3382 


3809 


29 


257121.2 


94329746 


3472 


3849 


29 


257121.2 


94243717 


3477 


3856 


29 


257121.2 


3929886H1 


3478 


3769 


29 


257121.2 


92619701 


3487 


3892 
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wo 00/73509 



PCT/US00/1S404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


29 


257121.2 


92835679 


3490 


3849 


29 


257121.2 


92216408 


3496 


3849 


29 


257121.2 


3040078H1 


3500 


3780 


29 


257121.2 


92261946 


3498 


3849 


29 


257121.2 


91622116 


3502 


3855 


29 


257121.2 


93190675 


3510 


3848 


29 


257121.2 


93281587 


3517 


3852 


29 


267121.2 


91166385 


3617 


3849 


29 


257121.2 


93069539 


3519 


3856 


29 


257121.2 


93229168 


3523 


3849 


29 


257121.2 


92278678 


3527 


3855 


29 


257121.2 


92595803 


3528 


3819 


29 


257121.2 


91994549 


3533 


3860 


29 


257121.2 


9613905 


3177 


3513 


29 


257121.2 


590584TH1 


3186 


3500 


29 


257121.2 


1444396H1 


3186 


3461 


29 


257121.2 


94187265 


3195 


3670 


29 


257121.2 


247381 6T6 


3190 


3808 


29 


257121.2 


94186245 


3196 


3664 


29 


257121.2 


93278632 


3196 


3616 


29 


267121.2 


91606304 


3213 


' 3436 


29 


257121.2 


1658291H1 


3211 


3453 


29 


257121.2 


91994550 


2249 


2499 


29 


257121.2 


3053733H1 


2279 


2581 


29 


257121.2 


5641114H1 


2298 


2545 


29 


267121.2 


996725R1 


2300 


2819 


29. 


257121.2 


995725H1 


2300 


2596 


29 


257121.2 


2173193F6 


2302 


2653 


29 


257121.2 


2173193H1 


2302 


2636 


29 


257121.2 


3474978H1 


2324 


2601 


29 


257121.2 


2106564H1 


2329 


2585 


29 


257121.2 


5171463H1 


2343 


2629 


29 


267121.2 


3449691 HI 


2365 


2601 


29 


257121.2 


2824470T6 


2356 


2941 


29 


257121.2 


5546639m 


2387 


2526 


29 


257121.2 


5658068H1 


2396 


2642 


29 


267121.2 


3771054H1 


1386 


1692 


29 


257121.2 


2714558T6 


1407 


1749 


29 


257121.2 


92037431 


1419 


1693 


29 


267121.2 


2436316H1 


1459 


1650 


29 


257121.2 


□953893 






29 


257121.2 


6075654H1 


1487 


1796 


29 


257121.2 


91065687 


1504 


1601 


29 


257121.2 


92034372 


1557 


1818 


29 


257121.2 


1235661 HI 


1567 


1836 


29 


257121.2 


2824470F6 


1673 


1997 


29 


257121.2 


2824470H1 


1573 


1809 


29 


257121.2 


94509712 


1585 


1995 


29 


267121.2 


93700826 


1687 


1927 


29 


257121.2 


4876427H1 


1618 


1807 
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wo 00/73509 



PCT/USOO/15404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


29 


257121.2 


3297894H1 


1657 


1904 


29 


257121.2 


93764184 


1663 


1795 


29 


257121.2 


43061 67H1 


1689 


1809 


29 


257121.2 


446319T6 


1703 


2175 


29 


257121.2 


5984668H1 


1802 


2091 


29 


257121.2 


5594456H1 


1805 


1931 


29 


257121.2 


3202943H1 


1818 


2073 


29 


257121.2 


309745H1 


3396 


3648 


29 


257121.2 


4652103H1 


3402 


3556 


29. 


257121.2 


g3058761 


3407 


3849 


29 


257121.2 


93961167 


3409 


3844 


29 


257121.2 


93178632 


3413 


3849 


29 


257121.2 


5645258H1 


190 


At& 


29 


257121.2 


9751765 


209 


449 


29 


257121,2 


9751766 


344 


636 


29 


257121.2 


2478708H1 


363 


607 


29 


257121.2 


5544313H1 


370 


582 


29 


257121.2 


1911275H1 


460 


725 


29 


257121.2 


1437425F1 


511 


985 


29 


257121.2 


1437426H1 


511 


734 


29 


257121.2 


1437425H1 


511 


736 


29 


257121.2 


3697654H1 


541 


823 


29 


257121.2 


3333937H1 


589 


769 


29 


257121.2 


2639996H1 


706 


952 


29 


257121.2 


- 446319R6 


788 


1254 


29 


257121.2 


446319H1 


788 


1039 


29 


257121.2 


3068429F7 


1 • 


419 


29 


257121.2 


3068429H1 


1 


291 


29 


257121.2 


2821269H1 


126 


301 


29 


257121.2 


3160523H1 


3217 


3501 


29 


257121.2 


5024976H1 


3222 


3513 


29 


257121.2 


3894771H1 


3241 


3546 


29 


257121.2 


336091 5H1 


3240 


3354 


29 


257121.2 


1427954T6 


3250 


3809 


29 


257121.2 


1741564T6 


3251 


3811 


29 


257121.2 


1892219H1 


3250 


3516 


29 


257121.2 


2173193T6 


3255 


3819 


29 


257121.2 


1861145H1 


3261 


3586 


29 


257121.2 


1531024H1 


3769 


3849 


29 


257121.2 


26081 36H1 


3540 


3797 


29 


257121.2 


26081 36T6 


3533 


3805 


29 


257121.2 


26081 36F6 


3540 


3853 


29 


257121.2 


91210010 


3540 


3850 


29 


257121.2 


92805071 


3540 


3827 


29 


257121.2 


92218792 


3542 


3849 


29 


257121.2 


2721666H1 


3563 


3814 


29 


257121.2 


4901093H1 


3614 


3850 


29 


257121.2 


93092920 


3617 


3818 


29 


257121.2 


4347921 HI 


3623 


3877 


29 


257121.2 


3109370H1 


3625 


3723 
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wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


29 


257121.2 


g1 757345 


3644 


3855 


29 


257121.2 


g1211126 


3645 


3867 


29 


257121.2 


g1505472 


3647 


3864 


29 


257121.2 


5565971H1 


3650 


3853 


29 


257121.2 


4613032H1 


3666 


3856 


29 


257121.2 


892266H1 


3672 


3830 


29 


257121.2 


3729288H1 


3676 


3853 


29 


257121.2 


gl813011 


3677 


3855 


29 


257121.2 


92063050 


3680 


3849 


29 


257121.2 


3729209T1 


3681 


3810 


29 


257121.2 


92063735 


3707 


3849 


29 


257121.2 


92882056 


3741 


3849 


29 


257121.2 


1547677H1 


3769 


3844 


29 


257121.2 


6024576H1 


2558 


2843 


29 


257121.2 


4909221H1 


2607 


2901 


29 


257121.2 


1427954H1 


2634 


2884 


29 


257121.2 


5170996H1 


2648 


2874 


29 


257121.2 


1427954F6 


2665 


3120 


29 


257121.2 


946401 HI 


2666 


2926 


29 


257121.2 


1374579H1 


2670 


2923 


29 


257121.2 


2261955H1 


2684 


2948 


29 


257121.2 


3094766H1 


2691 


2994 


29 


257121.2 


2741005H1 


2693 


2978 


29 


257121.2 


4635067H1 


2704 


2978 


29 


257121.2 


1007503H1 


2718 


3048 


29 


257121.2 


5284328H1 


2720 


2957 


29 


257121.2 


5064384H1 


2731 


2941 


29 


257121.2 


1672059H1 


2742 


2924 


29 


257121.2 


1672008H1 


2742 


2975 


29 


257121.2 


1672059F6 


2742 


3123 


29 


257121.2 


2410045H1 


2774 


3011 


37 


84399.1 


2520472H1 


1 


226 


37 


84399.1 


94148125 


155 


499 


38 


350044.1 


31 10061 F7 


1 


276 


38 


350044.1 


3n0061H1 


3 


289 


38 


350044.1 


4308349H1 


158 


426 


38 


350044.1 


4308349F6 


158 


587 


38 


350044.1 


5333549H1 


185 


413 


38 


350044.1 


339981 1H1 


405 


650 


38 


350044.1 


2288313H1 


496 


629 


38 


350044.1 


4637040H1 


583 


844 


38 


350044.1 


4637040F6 


582 


916 


38 


350044,1 


4308349T6 


662 


1020 


38 


350044.1 


308659H1 


664 


884 


38 


350044.1 


91803082 


832 


926 


38 


350044.1 


3977826H1 


852 


968 


39 


441329.2 


93761157 


1 


412 


39 


441329.2 


91470664 


342 


578 


39 


441329.2 


91395945 


342 


644 


39 


441329.2 


4327736H1 


355 


622 
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wo 00/73509 



PCT/USOO/15404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


39 


441329.2 


132848H1 


360 


540 


39 


441329.2 


132849R6 


360 


805 


39 


441329.2 


131890H1 


360 


563 


39 


441329.2 


131890R6 


360 


802 


39 


441329.2 


g1 165579 


361 


542 


39 


441329.2 


g1928191 


371 


650 


39 


441329.2 


4204432F6 


374 


765 


39 


441329.2 


g 1734964 


378 


763 


39 


441329.2 


3357071H1 


701 


979 


39 


441329.2 


3357071 F6 


701 


1118 


39 


441329.2 


131890T6 


1077 


1235 


39 


441329.2 


gl 733361 


1084 


1235 


39 


441329.2 


132849T6 


1123 


1236 


39 


441329.2 


4204432T6 


1128 


1235 


40 


442401.2 


3349655H1 


1 


327 


40 


442401.2 


4309840H1 


10 


304 


40 


442401.2 


4349106H1 


25 


238 


40 


442401.2 


5043378H1 


42 


296 


40 


442401.2 


4789236H1 


44 


123 


40 


442401.2 


5320882H1 


45 


180 


40 


442401.2 


2551237H1 


64 


319 


40 


442401.2 


4664370H1 


71 


322 


40 


442401.2 


3510753H1 


75 


389 


40 


442401.2 


693783H1 


80 


266 


40 


442401.2 


693783R6 


83 


554 


40 


442401.2 


3865603H1 


88 


388 


40 


442401.2 


2289862H1 


86 


326 


40 


442401.2 


3681694H1 


92 


386 


40 


442401.2 


693783T6 


128 


736 


40 


442401.2 


519767H1 


511 


741 


41 


444933.2 


3492265H1 


7 


310 


41 


444933.2 


gl501708 


7 


263 


41 


444933.2 


3155609H1 


7 


99 


41 


444933.2 


g1474433 


9 


386 


41 


444933.2 


3118539H1 


9 


310 


41 


444933.2 


3295816H1 


15 


277 


41 


444933.2 


1592931 HI 


29 


229 


41 


444933.2 


1592931 F6 


29 


395 


41 


444933.2 


1 592931 T6 


43 


579 


41 


444933.2 


3436778H1 


1 


229 


42 


481129.4 


999203H1 


234 


552 


42 


481129.4 


999203T1 


234 


708 


42 


481129.4 


93278054 


236 


751 


42 


481129.4 


3868296H1 


240 


560 


42 


481129.4 


g4522433 


240 


752 


42 


481129.4 


3137701 HI 


240 


575 


42 


481129.4 


g3417930 


241 


746 


42 


481129.4 


3868754H1 


240 


670 


42 


481129.4 


g4070376 


243 


752 


42 


481129.4 


715742H1 


248 


618 
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W606i73509 



PCT/USOO/15404 



TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


42 


481129.4 


4950629H1 


245 


592 


42 


481129.4 


g3644668 


250 


755 


42 


481129.4 


1577433H1 


250 


516 


42 


481129.4 


g3446453 


265 


747 


42 


481129.4 


2066077H1 


267 


589 


42 


481129.4 


g2288119 


272 


746 


42 


481129.4 


3358220H1 


293 


617 


42 


481129.4 


g3797800 


295 


747 


42 


481129.4 


g21 13046 


296 


749 


42 


481129.4 


g2740428 


301 


747 


42 


481129.4 


g2268545 


312 


747 


42 


481129.4 


g4087282 


314 


750 


42 


481129.4 


g2017050 


320 


622 


42 


481129.4 


gl924476 


320 


647 


42 


431129.4 


g3735602 


323 


748 


42 


481129.4 


g4532931 


324 


747 


42 


481129.4 


93734942 


324 


751 


42 


481129.4 


g2737174 


325 


747 


42 


481129.4 


g3960674 


330 


751 


42 


481129.4 


g2252043 


326 


746 


42 


481129.4 


g3330470 


342 


747 


42 


481129.4 


g3960381 


342 


765 


42 


481129.4 


g4452123 


361 


746 


42 


481 129.4 


g3678389 


352 


747 


42 


481 129.4 


1632555H1 


353 


564 


42 


481129.4 


1632539H1 


363 


671 


42 


481129.4 


4228443H1 


363 


686 


42 


481129.4 


93108753 


369 


749 


42 


481129.4 


g2319166 


368 


747 


42 


481129.4 


g3755468 


374 


764 


42 


481 129.4 


94086634 


397 


747 


42 


481 129.4 


g2322475 


404 


746 


42 


481129.4 


g3750698 


406 


748 


42 


481129.4 


g2021853 


411 


746 


42 


481129.4 


3099690H1 


423 


747 


42 


481129.4 


41 26491 HI 


425 


736 


42 


481129.4 


5107680H1 


443 


746 


42 


481129.4 


1253080F1 


454 


746 


42 


481129.4 


2741014H1 


456 


747 


42 


481129.4 


g4086632 


472 


747 


42 


481129.4 


3381102H1 


195 


363 


42 


481129.4 


3136744H1 


132 


344 


42 


481129.4 


2485548H1 


37 


110 


42 


481129.4 


323853bHl 


36 


320 


42 


481129.4 


5332434H1 


132 


262 


42 


481129.4 


1743373H1 


37 


342 


42 


481129.4 


2398069H1 


30 


282 


42 


481129.4 


6299970H1 


36 


291 


42 


481129.4 


4857940H1 


132 


361 


42 


481129.4 


982689H1 


28 


284 
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wo 00/73509 



PCT/USOOn5404 



TABLE 4 



SEQID NO: 


Template ID 


Component ID 


Start 


Stop 


42 


481129.4 


1684768H1 


34 


296 


42 


481129.4 


4901916H1 


29 


347 


42 


481129.4 


4d409S9Hl 


25 


317 


42 


481129.4 


1275760H1 


195 


375 


42 


481129.4 


g1975121 


132 


336 


42 


481129.4 


53402SH1 


30 


348 


42 


481129.4 


3555262H1 


24 


317 


42 


481129.4 


1504151H1 


30 


329 


42 


481129.4 


4662858H1 


195 


379 


42 


481129.4 


4638612H1 


132 


320 


42 


481129.4 


4801179H1 


30 


310 


42 


481129.4 


1970489H1 


35 


306 


42 


481129.4 


2497883H1 


30 


272 


42 


481129.4 


2557615H1 


132 


324 


42 


481129.4 


4079990H1 


195 


386 


42 


481129.4 


3555091 HI 


31 


350 


42 


481129.4 


g1635875 


160 


388 


42 


481129.4 


3010470H1 


132 


368 


42 


481129.4 


g1685660 


488 


846 


42 


481129.4 


2716085H1 


492 


747 


42 


481129.4 


92669356 


508 


750 


42 


481129.4 


g2035379 


508 


768 


42 


481129.4 


g4330124 


517 


749 


42 


481129.4 


g1 125228 


503 


768 


42 


481129.4 


gl237906 


515 


746 


42 


481129.4 


1253080H1 


547 


746 


42 


481129.4 


1924052H1 


559 


747 


42 


481129.4 


1444584H1 


557 


747 


42 


481129.4 


3089459H1 


563 


755 


42 


481129.4 


1952233H1 


596 


746 


42 


481129.4 


g2269237 


609 


746 


42 


481129.4 


20581 83R6 


620 


747 


42 


481129.4 


2058183H1 


620 


747 


42 


481129.4 


g3321468 


M7 


746 


42 


481129.4 


g2277314 


629 


747 


42 


481129.4 


4363320H1 


652 


742 


42 


481129.4 


g2752791 


670 


746 


42 


481129.4 


3221190H1 


49 


104 


42 


481129.4 


529956H1 


197 


303 


42 


481129.4 


1781280H1 


202 


315 


42 


481129.4 


586079H1 


35 


104 


42 


481129.4 


644429H1 


37 


104 


42 


481129.4 


4713384H1 


32 


97 


42 


481 129.4 


1434389H1 


30 


104 


42 


481129.4 


2286414H1 


30 


104 


42 


481129.4 


1648951 HI 


34 


111 


42 


481129.4 


3589355H1 


36 


111 


42 


481129.4 


1916331H1 


195 


338 


42 


481129.4 


880829H1 


28 


97 


42 


481129.4 


24751 70H1 


195 


350 



215 



wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


42 


481129.4 


gl301383 


263 


352 


42 


481129.4 


4612347H1 


169 


315 


42 


481129.4 


116571H1 


25 


83 


42 


481 129.4 


gl023181 


141 


205 


42 


481129.4 


835155H1 


37 


333 


42 


481129.4 


41531 16H1 


132 


329 


42 


481 129.4 


4300976H1 


132 


343 


42 


481129.4 


962735H1 


132 


324 


42 


481129.4 


490461 3H2 


132 


339 


42 


481129.4 


4904321H1 


195 


362 


42 


481129.4 


4130260H1 


132 


334 


42 


481129.4 


634025F1 


1 


184 


42 


481129.4 


5165213H2 


4 


265 


42 


481129.4 


g3085628 


1 


187 


42 


481129.4 


92836097 


1 


184 


42 


481129.4 


92397633 


1 


179 


42 


481129.4 


92669712 


1 


181 


42 


481129.4 


811971T1 


1 


191 


42 


481 129.4 


811971H1 


1 


67 


42 


481129.4 


g1 57841 7 


1 


184 


42 


481129.4 


93016146 


1 


253 


42 


481129.4 


g1425361 


1 


232 


42 


481129.4 


92237584 


1 


231 


42 


481129.4 


94072976 


1 


234 


42 


481129.4 


93076557 


1 


252 


42 


481129.4 


3716227H1 


1 


125 


42 


481129.4 


92350483 


1 


232 


42 


481129.4 


91289724 


1 


184 


42 


481129.4 


4830687H1 


14 


227 


42 


481129.4 


1270745H1 


1 


82 


42 


481129.4 


93423624 


1 


187 


42 


481129.4 


91685572 


1 


232 


42 


481129.4 


92907402 


1 


79 


42 


481129.4 


92987865 


1 


235 


42 


481129.4 


4664951H1 


1 


115 


42 


481129.4 


91114304 


1 


162 


42 


481129.4 


1694416H1 


1 


78 


42 


481129.4 


94076242 


1 


234 


42 


481129.4 


92806310 


1 


181 


42 


481129.4 


93110520 


1 


231 


42 


481129.4 


93178699 




235 


42 


481129.4 


5691870H1 




56 


42 


481129.4 


92408708 




232 


42 


481129.4 


93070820 




231 


42 


481129.4 


91194184 




183 


42 


481129.4 


g34(X)227 




296 


42 


481129.4 


91080367 




151 


42 


481129.4 


92768589 




231 


42 


481129.4 


93804766 




184 


42 


481129.4 


93190855 




232 



216 



wo 00/73509 



PCT/USOO/15404 



TABLE 4 



SEQID NO: 


T mplotelD 


Component ID 


Start 


Stop 


42 


481129.4 


gl087452 


1 


267 


42 


481129.4 


4065091H1 


1 


169 


42 


481 129.4 


g2838959 


1 


236 


42 


481129.4 


3091849H1 


1 


226 


42 


481129.4 


g2263870 


1 


183 


42 


481129.4 


92279285 


1 


239 


42 


481129.4 


2755370H1 


1 


184 


42 


481129.4 


2757311 HI 


1 


184 


42 


481129.4 


863475H1 


1 


192 


42 


481129.4 


gl689421 


1 


183 


42 


481129.4 


5528923H1 


19 


211 


42 


481129.4 


g 1506021 


1 


184 


42 


481129.4 


2467514H1 


10 


261 


42 


481129.4 


2943819H1 


15 


312 


42 


481129.4 


gl043884 


16 


196 


42 


481129.4 


4461552H1 




231 


42 


481129.4 


g1281669 


1 


531 


42 


481129.4 


g727471 


1 


234 


42 


481129.4 


4889929H1 


29 


314 


42 


481129.4 


49851 13H1 


29 


354 


42 


481129.4 


1488272H1 


5 


319 


42 


481129.4 


2450834F6 


6 


558 


42., 


481129.4 


2450834H1 


6 


287 


42 


481129.4 


g2027943 


7 


201 


42 


481129.4 


239431 1H2 


8 


123 


42 


481129.4 


3372442H1 


1 


258 


42 


481129.4 


93307272 


28 


238 


42 


481129.4 


3773060H1 


38 


366 


42 


481129.4 


4125777H1 


10 


329 


42 


481129.4 


5159564H1 


12 


302 


42 


481129.4 


3233460H1 


12 


297 


42 


481129.4 


5552762H1 


12 


315 


42 


481129.4 


593632H1 


12 


123 


42 


481129.4 


4513470H1 


13 


313 


42 


481129.4 


5n5162Hl 


44 


356 


42 


481129.4 


50391 59H2 


15 


286 


42 


481129.4 


2134140H1 


16 


328 


42 


481129.4 


6943202H1 


44 


374 


42 


481129.4 


027622R6 


17 


589 


42 


481129.4 


027622H1 


17 


197 


42 


481129.4 


3115953H1 


46 


372 


42 


481129.4 


3489959H1 


46 


382 


42 


481129.4 


5166761H1 


44 


322 


42 


481129.4 


3626341H1 


18 


386 


42 


481129.4 


3645933H1 


48 


327 


42^ 


481129.4 


31 59541 HI 


19 


348 


42'' 


481129.4 


2741058H1 


52 


282 


42 


481129.4 


1475821 HI 


50 


252 


42 


481129.4 


4567661 HI 


22 


350 


42 


481129.4 


3601462H1 


22 


394 



217 



wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


42 


481 129.4 


4595577H1 


53 


283 


42 


481129.4 


381971 7H1 


22 


426 


42. 


481129.4 


4975524H1 


22 


359 


42 


481129.4 


2084168H1 


50 


348 


42 


481129.4 


5529123H1 


18 


212 


42 


481129.4 


5700770H1 


52 


344 


42 


481129.4 


5701471 HI 


52 


355 


42 


481129.4 


g31 34982 


46 


234 


42 


481129.4 


4979603H1 


52 


355 


42 


481 129.4 


3202264H1 


23 


321 


42 


481129.4 


4938763H1 


16 


130 


42 


481129.4 


4202906H1 


8 


274 


42 


481129.4 


526404H1 


22 


322 


42 


481129.4 


2665708H1 


21 


280 


42 


481129.4 


2941705H1 


22 


327 


42 


481129.4 


gl321143 


11 


346 


42 


481129.4 


92021854 


22 


337 


42 


481129.4 


1833977H1 


22 


134 


42 


481129.4 


2100632H1 


12 


286 


42 


481129.4 


3675543H1 


12 


320 


42 


481129.4 


3186510H1 


13 


400 


42 


481129.4 


3229333H1 


14 


321 


42 


481129.4 


4885571 HI 


19 


306 


42 


481129.4 


3074685H1 


16 


321 


42 


481129.4 


3663968H1 


16 


329 


42 


481129.4 


g2154555 


1 


29 


42' 


481129.4 


863475R1 


1 


192 


42 


481129.4 


423G048H1 


1 


176 


42 


481129.4 


3693558H1 


18 


328 


42 


481129.4 


g1648658 


1 


23 


42 


481129.4 


584205H1 


18 


309 


42 


481129.4 


4227248H1 


21 


324 


42 


481129.4 


gl 043979 


28 


415 


42 


481129.4 


g2142101 


1 


24 


42 


481129.4 


4203722H1 


22 


321 


42 


481129.4 


4174825H1 


22 


337 


42 


481129.4 


5115519H1 


23 


321 


42 


481 129.4 


3133373H1 


25 


334 


42 


481129.4 


1275760F6 


1 


147 


42 


481129.4 


1275760F1 


1 


76 


42 


481129.4 


2051350H1 


27 


350 


42 


481129.4 


4341005H1 


27 


398 


42 


481 129.4 


3134845H1 


1 


260 


42 


481129.4 


3484150H1 


1 


316 


42 


481129.4 


4584348H1 


1 


279 


42 


481129.4 


g2004902 


1 


283 


42 


481129.4 


437481 7H1 


1 


288 


42 


481129.4 


gl967486 


1 


183 


42 


481129.4 


4147580H1 


1 


267 


42 


481129.4 


gl425470 


1 


78 



218 



wo 00/73509 



PCTAJSOO/15404 



TABLE 4 



ID NO: 


lennpiaT© lu 


UOmponenT lU 


OTaiT 


<ifftn 
oTOp 




ilQl 1 Oft A 


A07/\A0AU 1 




OU 1 


42 


iio 1 ^ nts A 

4on29.4 


ftDftACilUl 

Uo2oo4nl 


1 C 
10 


171 
1 / i 


42 


J n 1 1 nn a 

481129.4 


^oftftoo ^ z. 

g3u03216 




ouo 


42 


481129.4 


g 1648659 




1 

loo 


42 


481129.4 


iieXOOXl LIT 

4563261 Ml 




zoo 


42 


481129.4 


4563351 HI 


1 o 

12 


136 


42 


481129.4 


116571F1 




zoo 


42 


481129.4 


g4372584 




OO A 

234 


42 


481129.4 


3994691 HI 


23 


ooo 

288 


42 


481129.4 


g2437312 




188 


42 


481129.4 


136145F1 




1 Oil 

184 


42 


481129.4 


2768229H1 




256 


42 


iio 1 1 nn A 

481129.4 


5852546H1 




z/z 


42 


481129.4 


C^a A A ilftOU 1 

5844492H1 


6 


z/y 


42 


481 1 29.4 


1/^/ room 


8 


o>in 


42 


481 129.4 


IftCftftftl LJI 

IVOU091nl 


0 


ZOh 


42 


ii o 1 1 na A 

481129.4 


OOft >lftTiLl_l 1 


in 


ZVU 


42 


AO 1 1 Oft A 

481 129.4 


4oa4oooni 


in 


Z/0 


42 


481129.4 


ee iiftftfti uo 

5840991 n2 


1 1 


OUO 


42 


il O 1 1 Oft A 

481 129.4 


CO itnoftftLio 


1 1 


0 14 


An 

42 


il Q ^ 1 Oft A 

481 129.4 


45591U5n 1 


lO 


ZVO 


42 


iio 1 y Oft A 

481129.4 


o71 lOoHl 


io 


IZO 


42 


481129.4 


ilftftOO>IDt-ll 

4998348H1 


1^ 


ZVo 


42 


481129.4 


871105R1 


12 


IzO 


42 


481129.4 


il ^^ft il B 1 

4770486H1 


13 


OQ"7 

zo/ 


42 


481129.4 


O >tOX OT 1 L1 1 

3426271 HI 


14 


one 
29o 


42 


481129.4 


3273645H1 


14 


298 


42 


481129.4 


il CftO T OftU 1 

4502129H1 


1 A 

14 


ono 
zUz 


42 


481129.4 


O Gift's OOilLJl 

3403224H1 


15 


zzl 


An 

42 


il O 1 1 Oft A 

481 129.4 


lOl lUloH 1 


1 ti 

IO 


AO 
Ot 


42 


VI O 1 1 Oft A 

481 129.4 


ilCftOOOftU 1 

4502229H1 


Io 


Zoo 


42 


481 129.4 


Af\AnA*iIJi-l 1 

4U42436H1 


o 

O 


Zoo 


42 


iio 1 1 on A 

481129.4 


oo454o4nl 


m 
lU 


OO/ 


Af\ 

42 


ilO 1 1 Oft A 

481 129.4 


o7ZU4O0n I 




04U 


An 

42 


ilOl 1 Oft A 

481 129.4 


IWoO/ iHl 




IZO 


An 

42 


ilDl lOft A 


0000^4on 1 




oon 

ZVU 




AQ^ too A 


'aoni 1A7H1 

oou 1 lo/n 1 


zz 




An 

42 


AO! 1 Oft A 

4ol 12V.4 


7>IA^CQLI1 

/4oooon 1 


IZ 


z*w 


An 

42 


ii o 1 1 on A 

481129.4 


gioooooo 




>li11 


An 

42 . 


AO t TOft A 

4o 1129.4 


ZOo/ooVn 1 


1 

10 


ouo 


42 


ilO 1 1 Oft A 

481129,4 


1 TOQOftCLIl 

172o2uoH 1 


1 *5 


ono 
zuv 


42 


481129.4 


2674637H1 


15 


271 


42 


481129.4 


4865434H1 


14 


293 


42 


481129.4 


2943583H2 


15 


322 


42 


481129.4 


116571R1 


1 


182 


42 


481129.4 


gl320320 


1 


363 


42 


481129.4 


2801342H1 


5 


264 


42 


481129.4 


5528955H1 


21 


208 


42 


481129.4 


3618142H1 


24 


377 


42 


481129.4 


1221 18H1 


17 


164 



219 



wo 00/73509 



PCT/USOO/15404 



TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


42 


481129.4 


3580438H1 


69 


320 


42 


481129.4 


gl638276 


25 


435 


42 


481129.4 


833841T1 


40 


708 


42 


481129.4 


3997421H1 


26 


345 


42 


481129.4 


4110340H1 


56 


374 


42 


481129.4 


24031 35H1 


58 


320 


42 


481129.4 


gl969685 


58 


472 


42 


481129.4 


3026745H1 


57 


366 


42 


481129.4 


833841 HI 


40 


406 


42 


481129.4 


6681519H1 


42 


198 


42 


481129.4 


3987429H1 


58 


376 


42 


481129.4 


51 77261 H2 


70 


330 


42 


481129.4 


880829R1 


29 


768 


42 


481129.4 


gl954123 


46 


346 


42 


481129.4 


755972H1 


59 


323 


42 


481129.4 


755972R1 


59 


709 


42 


481129.4 


3944829H1 


77 


366 


42 


481129.4 


1646083H1 


58 


262 


42 


481129.4 


g1496701 


1 


184 


42 


481129.4 


962736R2 


78 


716 


42 


481129.4 


2561965H1 


32 


359 


42 


481129.4 


g2008330 


80 


340 


42 


481129.4 


881179T1 


30 


687 


42 


481129.4 


534025R1 


32 


572 


42 


481129.4 


871106T1 


54 


123 


42 


481129.4 


gl956736 


64 


347 


42 


481129.4 


4800988H1 


83 


224 


42 


481129.4 


g1685765 


83 


403 


42 


481129.4 


1230023H1 


32 


172 


42 


481129.4 


1805729H1 


27 


189 


42 


481129.4 


1320892H1 


38 


321 


42 


481129.4 


1599902H1 


38 


290 


42 


481129.4 


4688734H1 


85 


374 


42 


481129.4 


gl955622 


70 


445 


42 


481129.4 


3517903H1 


' 40 


255 


42 


481129.4 


136146R1 


69 


590 


42 


481129.4 


4979660H1 


40 


354 


42 


481129.4 


1741320H1 


42 


338 


42 


481129.4 


5117Q19H1 


88 


386 


42 


481129.4 


gl924567 


88 


536 


42 


481129.4 


2343536H1 


89 


355 


42 


481129.4 


4536803H1 


90 


398 


42 


481129.4 


g21 58823 


96 


655 


42 


481129.4 


gl958858 


97 


676 


42 


481129.4 


g828123 


93 


501 


42 


481129.4 


g3988656 


97 


621 


42 


481129.4 


g3096874 


99 


539 


42 


481129.4 


g3178165 


107 


566 


42 


481129.4 


94079967 


110 


541 


42 


481129.4 


465693H1 


66 


199 



220 



wo 00/73509 PCTA)S00/1S404 



TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


42 


481129.4 


93003210 


111 


467 


42 


481129.4 


5265746H1 


120 


374 


42 


481129.4 


966316H1 


118 


374 


42 


481129.4 


1458085H1 


122 


396 


42 


481129.4 


1457484H1 


122 


305 


42 


481 129.4 


2109149H1 


122 


442 


42 


481129.4 


2109149R6 


123 


704 


42 


481129.4 


91243536 


77 


215 


42 


481129.4 


958148H1 


122 


431 


42 


481129.4 


5020691T1 


124 


705 


42 


481129.4 


91087561 


133 


459 


42 


481129.4 


91958920 


135 


700 


42 


481129.4 


91147415 


139 


444 


42 


481129.4 


93890878 


103 


203 


42 


481129.4 


2401220H1 


144 


390 


42 


481129.4 


3223013H1 


149 


494 


42 


481 129.4 


1680890H1 


149 


361 


42 


481129.4 


91298313 


155 


667 


42 


481129.4 


828345H1 


143 


226 


42 


481129.4 


3441046H1 


159 


442 


42 


481129.4 


2109149r6 


163 


707 


42 


481 129.4 


2004957H1 


176 


484 


42 


481129.4 


6063286H1 


180 


511 


42 


481129.4 


5020591H1 


179 


501 


42 


481129.4 


2450834T6 


182 


707 


42 


481 129.4 


027622T6 


183 


830 


42 


481129.4 


5597771 HI 


189 


470 


42 


481129.4 


821831T6 


192 


705 


42 


481129.4 


821 831 Rl 


192 


746 


42 


481 129.4 


821831R6 


192 


746 


42 


481 129.4 


821631F1 


192 


757 


42 


481 129.4 


92252207 


199 


748 


42 


481129.4 


1457484R1 


199 


746 


42 


481129.4 


93736470 


203 


746 


42 


481129.4 


91295465 


204 


757 


42 


481129.4 


5185783H1 


205 


496 


42 


481129.4 


92115256 


211 


352 


42 


481129.4 


4085165H1 


215 


541 


42 


481129.4 


91577977 


221 


716 


42 


481129.4 


93593921 


222 


750 


42 


481129.4 


93429019 


222 


747 


42 


481129.4 


5221813H2 


225 


517 


42 


481129.4 


5863560H1 


232 


539 


42 


481129.4 


999208H1 


234 


544 


42 


481129.4 


999203R1 


234 


750 


42 


481129.4 


93649202 


235 


753 


42 


481 129.4 


92265393 


234 


751 


42 


481129.4 


4049323H1 


235 


584 


43 


481999.1 


1255239H1 


838 


1104 


43 


481999.1 


4913234H1 


397 


512 



221 



wo 00/73509 



PCTAJ500/15404 



TABLE 4 



SEQID NO: 


Template ID 


Component ID 


Start 


Stop 


43 


481999.1 


1724376H1 


864 


932 


43 


481999.1 


1509213H1 


819 


1012 


43 


481999.1 


1849705H1 


281 


501 


43 


481999.1 


1509213F6 


819 


1104 


43 


481999.1 


1476972F6 


260 


574 


43 


481999.1 


g2195301 


866 


1104 


43 


481999.1 


9986997 


864 


941 


43 


481999.1 


880602R1 


397 


512 


43 


481999.1 


gl980533 


1 


283 


43 


481999.1 


92195281 


987 


1104 


43 


481999.1 


5526658H1 


780 


991 


43 


481999.1 


880602H1 


394 


457 


43 


481999.1 


1255206H1 


838 


1104 


43 


481999.1 


1509479F6 


819 


1104 


43 


481999.1 


3457008H1 


397 


481 


43 


481999.1 


3641330H1 


723 


1014 


43 


481999.1 


4181741H1 


717 


799 


43 


481999.1 


3384087F6 


604 


857 


43 


481999.1 


4760678H1 


1 


286 


43 


481999.1 


1509221H1 


819 


1006 


43 


481999.1 


1509455H1 


819 


997 


43 


481999.1 


2603491 HI 


860 


1104 


43 


481999.1 


170070H1 


797 


982 


43 


481999.1 


3427420F6 


521 


604 


43 


481999.1 


4760678F6 


1 


499 


43 


481999.1 


4913234F6 


235 


670 


43 


481999.1 


4072789H1 


237 


499 


43 


481999.1 


1524891 F6 


269 


890 


43 


481999.1 


398973R1 


305 


830 


43 


481999.1 


4972169H1 


420 


695 


43 


481999.1 


4357304H1 


435 


710 


43 


481999.1 


41 81 741 F6 


718 


1107 


43 


481999.1 


3123901 F6 


380 


434 


43 


481999.1 


5585642H1 


650 


868 


43 


481999.1 


1724376F6 


864 


941 


43 


481999.1 


2938884H1 


864 


941 


43 


481999.1 


3369038H1 


429 


706 


43 


481999.1 


gl 148977 


1 


387 


43 


481999.1 


gl971119 


842 


1104 


43 


481999.1 


gl 761297 


704 


1005 


43 


481999.1 


1476972H1 


260 


455 


43 


481999.1 


4211180H1 


642 


873 


43 


481999.1 


g3665013 


371 


434 


. 43 


481999.1 


6153580H1 


813 


1066 


43 


481999.1 


92141617 


440 


513 


43 


481999,1 


1509479H1 


819 


997 


46 


338992.1 


5510632H1 


1 


202 


46 


338992.1 


4782751 F6 


127 


187 


46 


338992.1 


4782751 HI 


127 


372 


46 


338992.1 


2987413H1 


192 


478 
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TABLE 4 



SEQ ID NO: 


Template IC 


1 Component ID 


Start 


Stop 


46 


338992.1 


6064208H1 


204 


465 


46 


338992.1 


g 1984325 


272 


505 


46 


338992.1 


g3239921 


294 


653 


46 


338992.1 


g2026718 


300 


591 


46 


338992,1 


g28 16869 


321 


569 


46 


338992.1 


4760384H1 


492 


680 


46 


338992.1 


4760384F6 


492 


965 


46 


338992.1 


1305496F6 


802 


1270 


46 


338992,1 


1305496H1 


802 


946 


46 


338992.1 


3617505H1 


823 


1024 


46 


338992.1 


g875546 


878 


1132 


46 


338992.1 


g669800 


877 


1199 


46 


338992.1 


g874649 


878 


1180 


46 


338992,1 


g771213 


1011 


1062 


46 


wm w\ m» J^m^m^m ^ 

j^awii. 1 


3211592H1 


1056 


1292 


46 


338992.1 


093259H1 


1081 


1322 


46 


338992.1 


926880H1 


1123 


1363 


46 


338992,1 


1366443H1 


1137 


1385 


46 


440O/V> 1 

338992.1 


1366443R6 


1137 


1466 


46 


338992.1 


4960778H1 


1160 


1431 


46 


338992.1 


4625867H1 


1224 


1487 


46 


338992.1 


3383444H1 


1259 


1504 


46 


338992.1 


5640688H1 


1417 


1657 


46 


338992.1 


4030718H1 


1568 


1830 


46 


338992.1 


4030718F6 


1568 


1945 


46 


338992.1 


2742284H1 


1620 


1870 


51 


206603.1 


g3278540 


1 


466 


61 


206603.1 


g4390504 


25 


492 


51 


OOZ. Z.OO T 

206603.1 


g3870096 


41 


504 


51 


206603.1 


gl 109407 


0"ff^ 

277 


604 


51 


206603,1 


5631413H1 


367 


608 


51 


o/\x ^0<4 1 

206603.1 


5631413F6 


367 


812 


51 


00X.X04 1 

206603.1 


1992082H1 


531 


726 


51 


OOi iO*5 1 

206603.1 


XOOOOZ 1111 

6092061 HI 


626 


906 


52 


435694.2 


4936540H1 


1 1 lO 

1112 


1364 


52 


435694.2 


COOl LJl 

V 5021371H1 


1139 


1417 


CO 

52 


435694.2 


O'TOZ.OOZ.LJI 

3726226H1 


1 1 cc 

1155 


1460 


CO 

52 


435694.2 


mOOI CI oi 

g2ul5121 


1170 


1366 


CO 

52 


435694.2 


456419H1 


1 1 oc 

1185 


1425 


CO 

52 


A*iC XO il O 

435694.2 


461227R6 


1185 


1685 


52 


435694.2 


456301H1 


1185 


1433 


52 


435694.2 


461227H1 


1185 


1446 


52 


435694.2 


460920H1 


1185 


1432 


52 


435694.2 


461106H1 


1185 


1440 


62 


435694,2 


457987H1 


1185 


1424 


62 


435694.2 


458221H1 


1185 


1435 


52 


435694.2 


454834R1 


1186 


1705 


62 


435694.2 


461049H1 


1185 


1425 


52 


435694.2 


464834H1 


1186 


1424 


52 


435694.2 


4957324H1 


266 


486 
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TABLE 4 



SEQIDNa 


Template ID 


Ccxnponent ID 


Start 


Slop 


52 


435694.2 


3390485H1 


267 


486 


52 


435694.2 


1437407H1 


243 


486 


52 


435694.2 


S902323H1 


243 


486 


52 


435694.2 


3676724H1 


271 


486 


52 


435694.2 


1437407F6 


243 


486 


62 


435694.2 


1436837H1 


243 


480 


52 


435694.2 


6900454H1 


243 


486 


52 


435694.2 


2706491 HI 


270 


566 


52 


435694.2 


2837043H1 


244 


486 


52 


435694.2 


3165547H1 


246 


486 


52 


435694.2 


5903233H1 


247 


568 


52 


435694.2 


3687630H1 


251 


566 


52 


435694.2 


4268779H1 


252 


486 


52 


435694.2 


3460048H1 


251 


487 


52 


435694.2 


3077826H1 


253 


501 


52 


435694.2 


4121058H1 


271 


486 


52 


435694.2 


4900932H1 


273 


562 


52 


435694.2 


3615215F6 


280 


731 


52 


435694.2 


832322H1 


252 


486 


52 


435694.2 


3615215H1 


280 


588 


52 


435694.2 


2406515H1 


254 


466 


52 


435694.2 


3141771H1 


327 


608 


52 


435694.2 


3615215T6 


351 


720 


52 


436694.2 


g 1774842 


351 


728 


52 


436694.2 


3932194H1 


351 


486 


52 


435694.2 


4003539H1 


351 


589 


52 


435694.2 


g587196 


351 


593 


52 


435694.2 


1285417F6 


400 


708 


52 


435694.2 


128641 7H1 


400 


637 


52 


435694.2 


2763950H1 


502 


762 


52 


435694.2 


2415062H1 


512 


710 


52 


435694.2 


2415070H1 


512 


702 


52 


435694.2 


3860483H1 


517 


813 


52 


435694.2 


g1 766458 


520 


850 


52 


435694.2 


g3076400 


519 


940 


52 


435694.2 


8231 27H1 


566 


831 


52 


435694.2 


g3190613 


562 


745 


52 


436694.2 


2926224H2 


688 


999 


52 


436694.2 


2708490H1 


690 


942 


52 


436694.2 


g4311248 


702 


1176 


62 


435694.2 


g4389608 


707 


1164 


52 


435694.2 


g3700432 


738 


1174 


52 


435694.2 


g 1773867 


738 


1172 


62 


436694.2 


128541 n6 


768 


904 


52 


435694.2 


g41 11706 


797 


1172 


52 


435694.2 


77401 3H1 


826 


1036 


52 


435694.2 


1454843H1 


899 


1163 


52 


435694.2 


1450887F1 


899 


1172 


52 


435694.2 


3742523H1 


896 


1163 


52 


436694.2 


1460887H1 


899 


1152 
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TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


52 


436694.2 


3742566H1 


899 


1163 


62 


436694.2 


145484316 


913 


1128 


62 


435694.2 


4710468H1 


961 


1249 


62 


435694.2 


4977443H1 


1006 


1268 


52 


435694.2 


3237593H1 


1060 


1304 


62 


436694.2 


4705379T6 


1071 


1144 


62 


435694.2 


48531 12H1 


1100 


1295 


52 


435694.2 


2666646T6 


1628 


2175 


62 


436694,2 


2959842H1 


1635 


1861 


62 


436694.2 


2963934T6 


1633 


2162 


52 


435694.2 


3639070H1 


1655 


1967 


52 


435694.2 


4322530H1 


1660 


1782 


62 


435694.2 


50627 16H1 


1673 


1795 


52 


435694.2 


1710033H1 


1703 


1959 


62 


435694.2 


502137lTl 


1716 


2162 


52 


435694.2 


38947 18H1 


1752 


1968 


62 


435694.2 


2151922H1 


1765 


2018 


62 


436694.2 


gl646818 


1810 


2162 


62 


435694.2 


2323908H1 


1816 


1878 


62 


436694.2 


g4070415 


1817 


2162 


52 


436694.2 


g41 75351 


1821 


2162 


52 


436694,2 


g3596864 


1824 


2162 


62 


436694.2 


g3434496 


1843 


2162 


52 


435694.2 


gl776010 


1845 


2162 


52 


435694.2 


gl645819 


1872 


2162 


62 


435694.2 


gll64665 


1885 


2262 


52 


435694.2 


4701327H1 


1906 


2173 


52 


435694.2 


gl766363 


1910 


2162 


62 


435694.2 


2253564R6 


1939 


2171 


62 


435694.2 


2253564H1 


1939 


2196 


62 


436694.2 


4640530H1 


1951 


2181 


62 


435694.2 


AC Ar\^ A At 11 

4640l44Hl 


1953 


2199 


62 


435694.2 


1833068H1 


1959 


2171 


52 


435694.2 


g3003709 


1979 


2171 


52 


435694.2 


g3959668 


1980 


2162 


52 


435694.2 


g2358889 


2057 


2171 


52 


435694.2 


g4264824 


2059 


2162 


62 


435694.2 


474448H1 


2096 


2171 


52 


435694,2 


4615243H1 


1 


250 


62 


435694.2 


2633844H1 


21 


246 


62 


436694.2 


g3164852 


56 


491 


62 


436694.2 


4270549H1 


86 


338 


62 


435694.2 


3330903H1 


114 


331 


62 


435694.2 


4315068H1 


219 


423 


52 


435694.2 


4315053H1 


221 


486 


62 


436694.2 


4706408H1 


242 


486 


62 


436694.2 


1436837F1 


243 


695 


62 


435694.2 


6900458H1 


243 


486 


52 


435694.2 


6903501H1 


242 


486 


52 


435694.2 


6903201 HI 


243 


486 
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TABLE 4 



SEQIDNO: 


Template ID 


Component ID 


Start 


Stop 


52 


435694.2 


5426328H1 


1199 


1361 


52 


435694.2 


gl989927 


1223 


1525 


52 


435694.2 


gl989319 


1223 


1632 


52 


435694.2 


g572779 


1276 


1621 


52 


435694.2 


4611420H1 


1351 


1610 


52 


435694.2 


4243682H1 


1370 


1625 


62 


435694.2 


5868132H1 


1375 


1642 


52 


435694.2 


792533H1 


1412 


1646 


52 


435694.2 


g 1775555 


1428 


1811 


52 


435694.2 


1469570H1 


1472 


1684 


52 


435694.2 


1469570F6 


1472 


1787 


52 


435694.2 


2461531H1 


1485 


1710 


52 


435694.2 


g2220695 


1488 


1893 


52 


435694.2 


gl 196070 


1498 


1783 


52 


435694.2 


3561975H1 


1527 


1843 


52 


435694.2 


1830391 HI 


1583 


1847 
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CLAIMS 

What is claimed is: 

1 . An isolated polynucleotide conqnising a polynucleotide sequence sdected from the group 
consisting of: 

a) a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-52, 

b) a naturally occurring polynucleotide sequence having at least 90% sequoice id^tity to a 
polynucleotide sequence sdected from the group consisting of SEQ ID N0:l-S2, 

c) a polynucleotide sequence conq}len)entary to a), 

d) a polynucleotide sequence complementary to b), and 

e) anRNAequivaloitof a)throughd). 

2. An isolated polynucleotide of claim 1 , comprising a polynucleotide sequoice sdected from 
tiie group consisting of SEQ ID NO: 1*52. 

3. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide 
of claim 1. 

4. A con^)osition for the detection of expression of diagnostic and therapeutic polynucleotides 
conq)rising at least one of the polynucleotides of claim 1 and a detectable labd. 

5. A metiiod for detecting a target polynucleotide in a sample, said target polynucleotide 
having a sequence of a polynucleotide of claim 1 , the method comprising: 

a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction 
amplification, and 

b) detecting the presence or absence of said an^liiied targa polynucleotide or fragment 
thereof, and, optionally, if present, the anK)unt thereof. 

6. A metiKxl for d^ecting a targ^ polynucleotide in a sanqile, said target polynucleotide 
comprising a sequence of a pdynucleotide of claim 1, tiie method comprising: 

a) hybridizing the sample with a ptdbt conq)rising at least 20 contiguous nucleotides 
con9)rising a sequence complem^itary to said target polynucleotide in die sample, and which probe 
specifically hybridizes to said targ^ polynucleotide, under conditions wherd)y a hybridization comply 
is formed between said probe and said target polynucleotide, and 

b) detectii^ the presence or absence of said hybridization complex, and, optionally, if present, 
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the amount thereof. 

7. A m^od of daim 5, wherein the probe conq}rises at least 30 contiguous nucleotides. 

5 8. A m^od of daim S, wherein the probe conq)rises at least 60 contiguous nucleotides. 

9. A recombmant polynucleotide conqirising a promoter sequence operably linked to a 
polynucleotide of claim 1 . 

10 10. A cdl transformed with a recombinant polynucleotide of claim 9. 

1 1 . A transgenic organism comprising a recombinant polynucleotide of daim 9. 

12. A method for produdng a diagnostic and therapaitic polypq)tide, the method conq>rising: 
15 a) culturing a cdl under conditions suitable for expression of the diagnostic and therapeutic 

polypq)tide, wherdn said cdl is transformed with a recombinant polynucleotide of claim 9, and 
b) recovering the diagnostic and therapeutic polypeptide so ^pressed. 

1 3. A purified diagnostic and tiierapeutic polypq}tide (DITHP) ^coded by at least one of the 
2 0 polynucleotides of claim 2. 

14. An isolated antibody which specifically binds to a diagnostic and therapeutic polypeptide 
of claim 13. 

25 15. A method of identifying a test compound which spedHcally binds to the diagnostic and 

therapeutic polypqDtide of claim 1 3, the method comprising the steps of: 

a) providing a test compound; 

b) combining tiie diagnostic and therapaitic polyp^tide witii tiie test compound for a 
sufficient time and under suitable conditions for binding; and 

30 c) detecting binding of die diagnostic and therapeutic polypq)tide to the test compound, 

thereby identifying the test compound which specifically binds the diagnostic and therapeutic 
polypq)tide. 

16. A microarray wherdn at least one dement of the microairay is a polynucleotide of claim 3. 

35 
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17. A method for generating a transcript image of a sample whidi contains polynucleotides, 
the m^hod comprising the steps of: 

a) labeling the polynucleotides of the saiiq)le, 

b) contacting the dements of the miaoarray of claim 1 6 with the labded polynucleotides of the 
saiq)le under conditions suitable for the formation of a hybridization complex, and 

c) quantifying the expression of the polynucleotides in the sample. 

18. A method for saeening a conqxnmd for dfectivoiess in altering expression of a target 
polynucleotide, vlierdn said target polynucleotide comprises a polynucleotide sequence of claim 1 , the 
m^od comprising: 

a) exposing a saniple conq)rising the target polynucleotide to a compound, and 

b) detectiug aliened expression of the target polynudeotide. 

19. A method of claim 6 for toxicity testing of a compound, further comprising: 

c) comparing the presence, absence or amount of said target polynucleotide in a first 
biological sample and a second biological sample, wherein said first biological sample has been 
contacted with said compound, and said second sample is a control, wh^eby a change in presence, 
absence or amount of said target polynucleotide in said first sample, as compared with said second 
sample, is indicative of toxic response to said compound. 
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SEQUENCE LISTING 



<110> INCYTE GENOMICS, INC. 
HODGSON, David M. 
LINCOLN, Stephen E. 
RUSSO, Frank D. 
SPIRO, Peter A. 
BANVILLE, Steve C. 
BRATCHER, Shawn R. 
DUFOUR, Gerard E. 
COHEN, Howard J. 
ROSEN, Bruce 
CHALUP, Michael S. 
HILLMAN, Jennifer L. 
JONES, Annisa L. 
YU, Jinny Y. 
GREENAWALT, Lila B. 
PANZER, Scott R. 
ROSEBERRY, Ann M. 
WRIGHT, Rachel J. 
DANIELS, Susan E. 

<120> MOLECULES^ FOR DIAGNOSTICS AND THERAPEUTICS 

<130> PT-1022 PCT 



<140> To Be Assigned 
<141> Herewith 



<150> 60/137,109 
60/137,161 
60/137,173 
60/147,527 
60/147,824 



60/137,337; 
60/137,417; 
60/137,411; 
60/147,520; 
60/147,541; 



60/137,258; 
60/137,259; 
60/147,436; 
60/147,536; 
60/147,542; 



60/137,260; 
60/137,396; 
60/147,549; 
60/147,530; 
60/147,500 



60/137,113 
60/137,114 
60/147,377 
60/147,547 



1999 


-06- 


02; 


1999- 


06- 


03; 


1999- 


06- 


02; 


1999 


-06- 


02; 


1999 


-06- 


02 


1999 


-06- 


01; 


1999- 


06- 


03; 


1999- 


06- 


02; 


1999 


-06- 


03; 


1999 


-06- 


02 


1999 


-06- 


02; 


1999- 


06- 


03; 


1999- 


08- 


04; 


1999 


-08- 


05; 


1999 


-08- 


04 


1999 


-08- 


05; 


1999- 


08- 


05; 


1999- 


08- 


05; 


1999 


-08- 


05; 


1999 


-08- 


05 


1999 


-08- 


05; 


1999- 


08- 


05; 


1999- 


08- 


05; 


1999 


-08- 


OS 









<160> 52 



<170> PERL Program 

<210> 1 
<211> 756 
<212> DNA 

<213> Homo sapiens 



<220> 

<221> mis cofeature 

<223> Incyte ID No: 061149. l.j 

<220> 

<221> unsure 
<222> 95 

<223> a, t, c, g, or other 
<400> 1 

gagaaatcca acaagctgct gctagctttg 
ctccaatacg tgtgccccgg cacagaatgc 
ccggtgccgg acccgtaccg ctcggaggat 
aatttcaccc gcggcgacct cctgcgcaag 
atcgtgttcc tgcacatcca gaagaccggg 
aacatccagc tggagcagcc gtgcgagtgc 



gtgatgctct tcctatttgc cgtgatcgtc 60 
cagcncctcc gcctgcaggc gtt cage tec 120 
gagagctccg ccaggttcgt gceccgctac 180 
gtagacttcg acatcaaggg cgatgacctg 240 
ggcaccactt tcggccgcca cttggtgcgt 300 
cgcgtgggtc agaagaaatg cacttgccac 360 



1/30 



wo 00/73509 



PCT/USOO/15404 



cggccgggta agcgggaaac ctggctcttc 
ttgcacgccg actggaccga gctcaccagc 
gacgccaggc tgagaccgtc cagcctgcag 
aaaaagataa gaagtttcat tcagaaeiaga 
aagtcttctc agtgcaatca ccaggcaaat 
aataaaaaaa ctcaagagac aataatagtg 
taaatcaatt atttatacac ttaacatcat 

<210> 2 

<211> 982 

<212> DNA 

<213> Homo sapiens 



tccaggttct ccacgggctg gagctgcggg 420 

tgtgtgccct ccgtggtgga cggcaagcgc 480 

cagaatcacc tggcgaggag gtgcctgttg 540 

tgaccctgaa gtaaggcaat gagcttccaa 600 

tttataaggc agtcaagaag tttacctaga 660 

attaaagtta aatttgatcc cgttcccata 720 

ttgaga 756 



<220> 

<221> misc_feature 

<223> Incyte ID No: 404508. 3. j 

<220> 

<221> unsure 
<222> 206 

<223> a, t, c, g, or other 

<4.0O> 2 

gagagaggat gactgcgcga gaagaagcta gcttacgaac acttgaaggc agacgacgtg 60 
ccaccttgct tagcgcccgt caaggaatga tgtctgcacg aggagacttc ctaaattatg 120 
ctctgtctct aatgcggtct cataatgatg agcattctga tgttcttcca gttttggatg 180 
tttgctcatt gaagcatgtg gcatangttt ttcaagcact tatatactgg attaaggcaa 240 
tgaatcagca gacaacattg gatacacctc aactagaacg caaaaggacg cgagaactct 300 
tggaactggg tattgataat gaagattcag aacatgaaaa tgatgatgac accaatcaaa 360 
gttagtgcac agaaaatctg ctaatctact tcaagaaatg gctgcctcag tagttcccct 420 
tcaagctttt taacttcatt attttctgtc tgggtgggaa tttctctttt cttctctatg 480 
tctcatttct tacattttct gtccattttt gcttacacat tcttgcgtta tacatttcac 540 
aactaaggat gtattaaaga aatgttataa actttccaac aatatttttt cttctatctt 600 
ttcttgtatt tgtcttagta ttatgtttac ctttactctt aatatactct tattacattt 660 
aataaaatct tagcaaaacc gttgttaata ttgatgcaaa atagatttaa tcaagaaaat 720 
gagaattacc agttacagct cccaaatatc agaacgctag caattaatac aatgaaaaga 780 
gaaaatatcc caactccaga cttttctcca aaagttttaa tatcacactt tggggtctag 840 
tggagtttat ttgtgggttg cagttatttc aaatgggact aaataatgca ttctccagtt 900 
tcttattatc aactagtact tttattcagc ctatggaaag acagttggcc taattgctgt 960 
tttgagaacc cage cat age te 982 

<210> 3 
<211> 1039 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc^f eature 

<223> Incyte ID No: 441227. 2. j 

<400> 3 

gcggcgccgg ggattgggag ggcttcttgc aggctgctgg gctggggcta agggctgctc 60 
agtttccttc agcggggcac tgggaagcgc catggcactg cagggcatct cggtcatgga 120 
gctgtccggc ctggccccgg gcccgttctg tgctatggtc ctggctgact tcggggcgcg 180 
tgtggtacgc gtggaccggc ccggctcccg ctacgacgtg agccgcttgg gaccggggca 240 
agcgctcgct agtgctggac ctgaagcagc cgcggggagc cgccgtgctg cggcgtctgt 300 
gcaagcggtc ggatgtgctg ctggagccct tccgccgcgg tgtcatggag aaactccagc 360 
tgggcccaga gattctgcag cgggaaaatc caaggcttat ttatgccagg ctgagtggat 420 
ttggccagtc aggaagcttc tgccggttag ctggccacga tatcaactat ttggctttgt 480 
caggtgttct ctcaaaaatt ggcagaagtg gtgagaatcc gtatgccccg ctgaatctcc 540 
tggctgactt tgctggtggt ggccttatgt gtgcactggg cattataatg gctctttttg 600 
accgcacacg cactgacaag ggtcaggtca ttgatgcaaa tatggtggaa ggaacagcat 660 
atttaagttc ttttctgtgg aaaactcaga aatcgagtct gtgggaagca cctcgaggac 720 
agaacatgtt ggatggtgga gcacctttct atacgactta caggacagca gatggggaat 780 
tcatggctgt tggagcaata gaaccccagt tctacgagct gctgatcaaa ggacttggac 840 
taaagtctga tgaacttccc aatcagatga gcatggatga ttggccagaa atgaagaaga 900 
agtttgeaga tgtatttgca aagaagacga aggcagagtg gtgtcaaate tttgaeggca 960 
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cagatgcctg tgtgactccg gttctgactt ttgaggaggt tgttcatcat gatcacaaca 1020 
aggaacgggg ctcgtttat 1039 

<210> 4 

<211> 1673 

<212> DMA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 277927.2 

<400> 4 

ggcagttgta aagtcgctgg ccagctagtg gagtggagac tgcagaggga gataaagaga 60 
gagggcaaag aggcagcaag agatttgtcc tggggatcca gaaacccatg ataccctact 120 
gaacaccgaa tcccctggaa gcccacagag acagagacag caagagaagc agagataaat 180 
acactcacgc caggagctcg ctcgctctct ctctctctct ctcactcctc cctccctctc 240 
tctctgcctg tcctagtcct ctagtcctca aattcccagt cccctgcacc ccttcctggg 300 
acactatgtt gttctccgcc ctcctgctgg aggtgatttg gatcctggct gcagatgggg 360 
gtcaacactg gacgtatgaa ggcccacatg gtcaggacca ttggccagcc tcttaccctg 420 
agtgtggaaa caatgcccag tcgcccatcg atattcagac agacagtgtg acatttgacc 480 
ctgatttgcc tgctctgcag ccccacggat atgaccagcc tggcaccgag cctttggacc 540 
tgcacaacaa tggccacaca gtgcgactct ctctgccctc tacGctgtat ctgggtggac 600 
ttccccgaaa atatgtagct gcccagctcc acctgcactg gggtcagaaa ggatccccat 660 
gtgggtcaga acaccagatc aacagtgaag ccacatttgc agagctccac attgtacatt 720 
atgactctga ttcctatgac agcttgagtg aggcttgctg agaggcctca gggcctggct 780 
gtcctgggca tcctaattga ggtgggtgag actaagaata tagcttatga acacattctg 840 
agtcacttgc atgaagtcag gcataaagat cagaagacct cagtgcctcc cttcaaccta 900 
agagagctgc tccccaaaca gctggggcag tacttccgct acaatggctc gctcacaact 960 
cccccttgct accagagtgt gctctggaca gttttttata gaaggtccca gatttcaatg 1020 
gaacagctgg aaaagcttca ggggacattg ttctccacag aagaggagcc ctctaagctt 1080 
ctggtacaga actaccgagc ccttcagcct ctcaatcagc gcatggtctt tgcttctttc 1140 
atccaaggat cctcgtatac cacaggaaga agaggctgga aaaccgaaag agtgtggtct 1200 
tcacctcagc acaagccacg actgaggcat aaattccttc tcagatacca tggatgtgga 1260 
tgacttccct tcatgcctat caggaagcct ctaaaatggg gtgtaggatc tggccagaaa 1320 
cactgtagga gtagtaagca gatgtcctcc ttcccctgga catctcctag agaggaatgg 1380 
acccaggctg tcattccagg aagaactgca gagccttcag cctctccaaa catgtaggag 1440 
gaaatgagga aatcgctgtg ttgttaatgc agagaacaaa ctctgtttag ttgcagggga 1500 
agtttgggat ataccccaaa gtcctctacc ccctcacttt tatggccctt tccctagata 1560 
tactgcggga tctctcctta ggataaagag ttgctgttga agttgtatat ttttgatcaa 1620 
tatatttgga aattaaagtt tctgacttta aaaaaaaaaa aaaaaaaaaa aaa 1673 

<210> 5 

<211> 968 

<212> DNA ^ 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 475311.1 

<400> 5 

ccgcgtcgat cctgggatgg aggaggtggc ggccgctgag gctcggcgtg aagacggcgg 60 
gcatggtggg gcgggagaaa gagctctcta tacactttgt tcccgggagc tgtcggctgg 120 
tggaggagga agttaacatc cctaatagga gggttctggt tactggtgcc actgggcttc 180 
ttggcagagc tgtacacaaa gaatttcagc agaataattg gcatgcagtt ggctgtggtt 240 
tcagaagagc aagaccaaaa tttgaacagg ttaatctgtt ggattctaat gcagttcatc 300 
acatcattca tgattttcag ccccatgtta tagtacattg tgcagcagag agaagaccag 360 
atgttgtaga aaatcagcca gatgctgcct ctcaacttaa tgtggatgct tctgggaatt 420 
tagcaaagga agcagctgct gttggagcat ttcctcatct acattagctc agattatgta 480 
tttgatggaa caaatccacc ttacagagag gaagacatac cagctcccct aaatttgtat 540 
ggcaaaacaa aattagatgg agaaaaggct gtcctggaga acaatctagg agctgctgtt 600 
ttgaggattc ctattctgta tggggaagtt gaaaagctcg aagaaagtgc tgtgactgtt 660 
atgtttgata aagtgcagtt cagcaacaag tcagcaaaca tggatcactg gcagcagagg 720 
ttccccacac atgtcaaaga tgtggccact gtgtgccggc agctagcaga gaagagaatg 780 
ctggtaagaa ggattcctga gtcctgtctt agcgaaggtc cgctttgtct tttccatgct 840 
tgaactttca cagctgtact tggagtgtta ctgagtgaaa gccaaaagtg cttttttaaa 900 
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actaggagac caaacaaaag tagtttacat atacactgta ttcatgaaga ataaaaatat 960 
tatgctct 968 

<210> 6 

<211> 3968 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> mis cofeature 

<223> Incyte ID No: 013039.2 



<400> 6 

cccaaagcaa aagaaataca gttacgatgg 
atttcctgtt acagtgtgga aatgtctcct 
caaggttctg aagtagaatg tacagtgagc 
agactacgtg cagctaacaa aatggggttt 
acagcccctg ggccaccaga tcagtgcaag 
tgtgcacaag tgaattggga ggttcctttg 
ctggagtggg gaggagttga aggaagtatg 
tatgaaataa aaggactttc accagcaact 
gttgtgggtg caggcccttt cagtgaagta 
ggcattgtga cctgtcttca agaaataagc 
ccttctacat gcctcgcaat aagctgggaa 
gcctacagca tagactttgg agataaacaa 
attatcaaca atttgcaacc agatacaaca 
cttggagctg gtcctttcag ccatatgata 
ccacctcgtc tggaatgtgt tgcctttagc 
ggaactccaa agacattgtc aaccgattct 
aatggacggt ttgtatccct atacagagga 
aatgagtcaa catcctataa attctgtatt 
ctctcccaag aatatatttt cactactcca 
aaaatagaga aagtaaatga tcacatttgt 
aaaggtgatc cagttattta cagtcttcaa 
cagatttaca agggtcccga ctcttccttc 
tatcgcttcc gtgtatgtgc cattcgccag 
gtaggtccct acagcaccac agtgctcttc 
accaacagag acactgtgga aagcacaagg 
gctgccgtca tccttgtgct gtttgctttc 
tactttgtaa tcaagtgaaa atataacttt 
catgtactaa aattatttct gtattgcttt 
ttgagactat agcacatcat ttttgccatt 
gcactttatt agaatgcaag ccacaaaaat 
cttctttttt tctttccctc tctctttttt 
taagaggcaa caatttagaa tggatatttt 
cctgatgctg tttgttttaa agattattac 
acactaacat gctatataaa atgttaaagt 
tttctacctc ctcatttgtc ttaattattt 
gggcttagaa acaaaaactg gatgaaagag 
gtggagttct tcattataaa tatatattca 
acagtttact tggcctaaaa atattttgat 
gagaacatgg aaaagaattg agtgctttta 
aaattgaatt tcaaacctat ttggcttctg 
tataagggta tacacatacc atatatggca 
agtgtagaag tatatattac ataacataca 
agaactccca taagtttctg ctgcttctcc 
cataatcaaa cctaaccttt ttgtttgggg 
ccagtaaact tcaagctgct ttctttcttg 
ttggatactg ttccaaattg ttgattgcat 
attgcataat tcattaatgt tttgtgagct 
gaattttgtc aagtatcaca ttgtacatct 
tacagtttat aatgaaacta tctacaattc 
acctgtaact agctttttta atttattatt 
tagttgctga ggttggcatt ttagtgatta 
aacgtatttt ttgtggcttt gaagatctct 
gtattgtaac agttttatgt caaatgatct 
aaaaagaaat ggataaactt ggcctttcta 



ggaccccctc tggttgatgg tggatcaccc 60 
atagaaaaag atgaacctag agaagtttac 120 
agccttcttc ctggaaagac atacagcttc 180 
ggaccatttt cagaaaaatg tgatattact 240 
ccccctcaag tgacatgtag atctgcaact 300 
agtaatggaa cagatgtcac tgaatatcga 360 
cagatatgtt actgtgggcc tggtctcagt 420 
acctattatt gcagggtcca ggctctgagt 480 
gtagcctgtg tgactccacc atcagttcct 540 
gatgatgaga tagaaaatcc ccattattca 600 
aagccttgtg atcatggttc ggaaatcctt 660 
tccctaacag tgggaaaggt tacaagctat 720 
tacagaatac gaattcaagc cttgaatagc 780 
aaattaaaaa ctaagcctct ccctcctgat 840 
caccagaacc ttaagctgaa atggggagaa 900 
attcagtacc accttcagat ggaggataag 960 
ccatgtcata catacaaagt acaaagactt 1020 
caagcttgta atgaagctgg ggaaggtccc 1080 
aaatctgtcc cagctgcctt gaaagccccc 1140 
gaaattacat gggagtgttt acagccaatg 1200 
gttatgttgg gaaaagattc agaattcaaa 1260 
cggtattcca gccttcagct gaactgtgaa 1320 
tgccaagact ctctgggaca ccaggacctc 1380 
atctctcaga ggactgaacc accagccagc 1440 
acccgacggg cactgagtga cgagcagtgt 1500 
ttttccattt tgattgcctt tatcattcag 1560 
attttttaac tctattacat tttattttgt 1620 
tataaaaaac agtggcattt agcactggca 1680 
ttcagtgctt atattgttag gtagaggctg 1740 
atcaattttg ttttttttgt tagggtgggt 1800 
taacaaatgc cttcttatag aaaaactttc 1860 
gacgaatcgg catgagtgta acagtgataa 1920 
caagtgaaaa attcagaatg aatagaattt 1980 
ctgatgctgt gaaagcaatc tagtgctata 2040 
ggtaagtggg attatgatga gtaactggag 2100 
tatgcatgaa gaaaagcttc tttgataaat 2160 
tgaattcaca gataagtact taaagaacag 2220 
gtttactcaa aaagtacctc ttcaggtctt 2280 
aatacttttt agaaagtaat ccataaaagt 2340 
ttttgtgaac ctttgaacta tatgtatgtg 2400 
tataacaagt gtacacatat acacataaca 2460 
ctcactctgt ctggtatagg ctaattttga 2520 
cattaactgc ttgccaccac catcagaatt 2580 
caccaaatct gaagacaaaa ttaatttgca 2640 
aaaactaaac gtttaacgta taatgtctgt 2700 
gtggttaatg ttgcattaga gcactttgca 2760 
tgcatttgtg agttattgga tgatcagact 2820 
tgcctagatg tcgatgactg caagtaataa 2880 
ttgttttagc acatctgtta tccgtaaaac 2940 
tgaattttag gatagcgaat cactaatttt 3 000 
ttaagcactt ctgtcagtct ttgaaaaaag 3060 
gaagaatttc ttttataata gaatgggcat 3120 
gtgctgtaga aaaacattaa cccttgttca 3180 
agtggtaaga atgacctgtc actataatat 3240 
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actgtatgtt tacattttat 
ageiaacaaca gtgattgcga 
tacaaatgag tctgagtgta 
taeiaatgtgc cactgtgtca 
cttcctattt ttgatagtaa 
aaaattagtg cagaggagaa 
tacattccaa aaactctaaa 
taaaattgta taaggctgaa 
aaactatgta gcaaaagtat 
tgagatacgt ttattgtatt 
atgttatggt ctcccctctt 
ttcaacatga gaggatgtat 
tttatcct 

<210> 7 
<211> 1937 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> misc.feature 
<223> IncytG ID No: 2' 

<220> 

<221> unsure 
<222> 9, 31, 1913, 1919-1920, 1924, 1927-1928 
<223> a, t, c, g, or other 

<400> 7 

ccgttttcnc aatttgggct ggacgaaaaa nacggtcttg ctttcccggt cgccgctgtc 60 
gggaagggct gcagggtgtc cgcgagaccc gcggcccggc gagctgaccc cgcctcgcct 120 
ttcctgccta gccctcattc cacggagccg gtcgcgcccg gtccttgcgc gacgcttccc 180 
ggcccaggcc gcctggtctg gcgctggagg ccggagtccc gcggcctgtg ctggatccgc 240 
gcacacccag tggcggcgga tgggcggccg gggcggccgg ggcggccggt cctgagcgcg 300 
gcccgggctg tcagggctgg ctgctggcgg gatggacacc ctggaggagg tgacttgggc 360 
caatgggagc acagcgctac ccccacccct ggcaccaaac atcagtgtgc ctcatcgctg 420 
cctgctgctg ctctacgaag acattggcac ctccagggtc cggtactggg acctcttgct 480 
gctcatcccc aatgtgctct tcctcatctt cctgctctgg aagcttccat ctgctcgggc 540 
gaagatccgc atcacctcca gccccatttt tatcaccttc tacatcctgg tgtttgtggt 600 
ggcgctggtg ggcattgccc gggccgtggt atccatgacg gtgagcacct cgaacgctgc 660 
aactgttgct gataagatcc tgtgggagat cacccgcttc ttcctgctgg ccatcgagct 720 
gagtgtgatc atcctggggc ctggcctttg gccacctgga gagtaagtcc agcatcaagc 780 
gggtgctggc catcaccaca gtgctgtccc tggcctactc tgtcacccag gggaccctgg 840 
agatcctgta ccctgatgcc catctctcag ctgaggactt taatatctat ggccatgggg 900 
gccgccagtt ctggctggtc agctcctgct tcttcttcct ggtctactct ctggtggtca 960 
tccttcccaa gaccccgctg aaggagcgca tctccctgcc ttctcggagg agcttctacg 1020 
tgtatgcggg catcctggca ctgctcaacc tactgcaggg gctggggagt gtgctgctgt 1080 
gcttcgacat catcgagggg ctctgctgtg tagatgccac aaccttcctg tacttcagct 1140 
tcttcgctcc gctcatctac gtggctttcc tccggggctt cttcggctcg gagcccaaga 1200 
tcctcttctc ctacaaatgc caagtggacg agacagagga gccagatgta cacctacccc 1260 
agccctacgc tgtggcccgg cgggagggcc tggaggctgc aggggctgct ggggcctcag 1320 
ctgccagcta ctcgagcacg cagttcgact ctgccggcgg ggtggcctac ctggatgaca 1380 
tcgcttccat gccctgccac actggcagca tcaacagcac agacagcgag cgctggaagg 1440 
ccatcaatgc ctgagggcag ctgccagggc ctgtggagga caggccagag aggaggccag 1500 
caggccccag agtccccagg ggaggaggac caggtcaagg gacgttctgt gggcagtagc 1560 
cctgtgtggc cctgttccca ccatgagtct ggaggcccca cctccctggg gctcccaatc 1620 
ccctttgcca tctctgctct cactggggac cctcctcccc ttcccacctg ctctcatact 1680 
gctcagtgac atggcccagg ctttccttcc agggccatgc ttggcaaggt tggctgaggg 1740 
caccctcctt ctctgcaccc ttggcacgag ggcagggctg gctctcccaa tgcctccatc 1800 
ccatccccat ggtgctttgg cctcctcaaa gcatccacca tggtggatgg actgaagtgt 1860 
gtatattttc ttgatctatt ttttaataaa aaggaaaagg agcagaaaaa aanaaaaaim 1920 
aaanaanngg aaaaaaa 1937 

<210> 8 

<211> 794 

<212> DNA 

<213> Homo sapiens 



ttaaatttaa tctcttatgt 
ttgttttcta gaaacttctt 
atagcccaga gatttatata 
agttacagtg gcttatgttt 
atgtcattta atagtatact 
aacaattttt aatgtaatct 
ctttttaaag attatagata 
tgaacttcat acaaatgaaa 
ctgtaaaatc catggaaaat 
catatatatt cattatttgc 
ccaatgagct taaaacattt 
atttattata taaagcccag 



atagggtgat aaccttcccc 3300 
taaagtgcca catttggcag 3360 
tagttgaatg tctaaaatgg 3420 
ttcatagtaa ttcaaatgaa 3480 
tgccatttga gcctcactgc 3540 
tgattttacc tcatatactg 3600 
cactaccaaa catatcacct 3660 
aaaatctcat aaaaatacat 3720 
aaaagttgta tcattctttt 3780 
tacctgttta agaaagtgaa 3840 
ttcccaacag tatataaatc 3900 
taaagaataa aattagaagt 3960 

3968 
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<220> 

<221> misc_feature 
<223> Incyte ID No: 



345322.1. j 



<220> 

<221> unsure 

<222> 50, 779, 786 

<223> a, t, c, g, or other 



<400> 8 

attagcgtat 

aggcaagcca 

cacttgctcg 

gtgtgtctga 

atctctgact 

ccgtgcctgt 

gacctacacc 

agagggggcc 

gtgtgtacac 

cattccctga 

tggtctgcag 

atggaattcc 

agtagctatg 

c cacancac t 



tgttcctttc 
ggcatcaccg 
atctggaatt 
ggaaacacat 
aattatcatc 
attccagagt 
agcatcactt 
caacacacaa 
accacccacc 
agaagaagaa 
caagaaggct 
actgcttgac 
cacatcctgg 
tata 



tgtattgtgc 
tggatgttga 
acagctattt 
cccggacacc 
attaccatgg 
cagattgcac 
gtgattctca 
ggggaattac 
ataaaagggg 
ggcaccaatg 
tcttggaaat 
ttccagaagc 
aagtctcctt 



tgagaggatc 
agaagggggt 
attacgaagc 
acttagggtt 
agcccaacag 
ctgaagcagc 
atgagcataa 
agaatgcatc 
ttaagcatct 
aaagagagga 
aactgaacta 
atcctccatc 
gactgaactt 



caagggattn 
tactcaaacc 
actctgtgtg 
agtctttctg 
tcccaaaaag 
agagcagatc 
ccccccagaa 
ccctaagcaa 
gaaaggccag 
gcagcgggac 
ttaacttttc 
tctgcacccc 
tagaactaag 



gtgggggaac 60 
tcggcatctt 120 
gcttagtgga 180 
agctcttcac 240 
atacagtttg 300 
aggaaaagaa 360 
atagatgaca 420 
aggaagcaga 480 
aatgaatcag 540 
cattaattac 600 
tgagtatacc 660 
acactcatac 720 
tacacattnt 780 
794 



<210> 9 

<211> 3991 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 
<223> Incyte ID No: 

<220> 

<221> unsure 
<222> 642 
<223> a, t, c, g. 



348094.6. j 



or other 



<400> 9 

tttgtagaga 

cgaatatgct 

ctatttgaaa 

agcaatcaca 

gaccggatat 

gacaatttag 

aaaaagtgca 

tagatcttct 

aagataatga 

tcaagcattt 

ttcagctttg 

aaccagctaa 

gccggttttt 

tgtctccaga 

gggctgtcta 

tttatactca 

ctattcagaa 

accagacgtc 

agctaaaaca 

agtcgtacct 

gaatccttaa 

tgcaaccccc 

gatggttttg 

gtttagatag 

ctcatacctg 

cat t gaga tg 

ctagttttag 



ccgtgtctcc 
tcaactctag 
tctctcttca 
aggaatgcaa 
gggctataat 
tgaagtttat 
gatatttgat 
taagcaactc 
actaaacata 
taagaagcaa 
cagtgcattg 
tgtgttcatt 
cagctcaaaa 
gagaatacat 
ctatatgaga 
ctgtgtaaga 
gaactccgac 
acctatgttt 
ttgcaagatc 
ccccatttat 
ccagttttca 
aaatgacttt 
aatggctaaa 
actgtcagtg 
agaaaaagta 
aacacaattg 
ttcttagcag 



ctatgtcgcc 
aattctgaaa 
aattgagttc 
gggccacctg 
acattagcca 
agagcagcct 
ttaatggatg 
aaccatccaa 
gttttggaac 
aagaggctaa 
gaacacatgc 
acagccactg 
accacagctg 
gaaaatggga 
tggctgcatt 
agatagaaca 
agttagttaa 
atgacgtagg 
atgaagagtg 
gtctggtgtt 
tataagcttc 
ggaataactg 
ggtttataga 
ccaaatattg 
tctgaacatg 
tgaacttttg 
agtagttttc 



caggaaaagg 
aaaatcagaa 
taaagttcct 
ttcctcagtt 
actttcgaat 
gtctcttgga 
ccaaagcacg 
atgtaataaa 
tagcagatgc 
ttcctgaaag 
attctcgaag 
gggtggtaaa 
cacattcttt 
tacaacttca 
acaaagtcct 
gtgtgactac 
tatgtgcatc 
caaagaggat 
taaccaaagt 
aagattaata 
attttgtacc 
aattgcatgt 
atttcttaca 
aaggtgcagc 
tgacttgttt 
tgaagatttt 
aaatatgatt 



aaactattaa 
aactaaaaac 
gttgcttcag 
ccaaccacag 
agaaaagaaa 
tggagtacca 
tgctgattgc 
atattatgca 
tggcgaccta 
aactgtttgg 
antcatgcat 
acttggagat 
agttggtacg 
aatctgacat 
ttctatggtg 
ccacctcttc 
aacccagatc 
gcatgcatgc 
aattgaaagt 
tttcagagct 
agtcacctaa 
taggagagaa 
gttttctgct 
ttggcacaca 
cttttttagt 
atttttaaac 
cttatgataa 



taagcttgga 
ttctgaagct 
acaatggatg 
aaggccttac 
attggtcgcg 
gtagctttaa 
atcaaagaaa 
tcattcattg 
tccagaatga 
aagtattttg 
agagatataa 
cttgggcttg 
ccttattaca 
ctggtctctt 
acaaaatgaa 
cttcagatca 
cagagaagcg 
actggcaagc 
attttgtgca 
agtgtgcttt 
atcacctcct 
aatgaaacat 
gataaattgt 
tcagaataga 
aatttatgga 
gtttgaagta 
atgtagacac 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 
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aaactatttg agaaacattt agaactctta 
gtgaagattt ggggacaaaa tgtgagtcag 
atttttgata ttctctttgc attgaaatgg 
ggatttgttt agctggtgtg ataataattt 
tgtgtgtttt tattgttgtt tgtacatttg 
tatatttcaa tttctttata aatttaagtg 
taagcctaag tttttattca taagttttat 
tttttatatt attcttcaag ttactttctt 
atgtttcata ctttctgaga gtataatacc 
tttcctggat tgaaaacttt ttttaaactt 
gtttggtctt gttaaagagg aagaaaggat 
acagttacaa tttatttgac aaggttgtaa 
ggccatgact acagccagaa ctgttatgag 
aaagtactgt atttgttcat gaagatgact 
aatgggcaga atttcgtaaa tgctgttgtg 
tagtggcgac cagtttctca cagaattgtg 
aaaggactct gtgccatctt acaaccttgg 
tgttcaaaga acacttccct ttagccgatg 
tttcttacac tcatttgaat gctttcaagc 
aaaaagtctg aacccttgtt ttctgaaatc 
attttatttt ggaataggta aaggaaacct 
catttttttt ctcacactct taatgacttt 
ttcctagaag tatgagaaga attattctta 
taatataatt gagatgaaat gttctctggt 
aatctttaag aatacataga tctaaaattc 
gtaaagatta aagcttttct tctcagtgaa 
aatttaatga tcagggaaat teat tat ttc 
gtatcttttt aaatctaaat gttcatattt 
attttaattt tgaatggaat aatttcaaag 
atatagtcac ctataaaatg ttctttatat 
gttaaacttt tgaattgatt tgaggagcag 
tatttagaca ttggtaccag ttacccaggt 
taaggtttag gaatggtgga tgaagggtat 
caatgattgt aaatttagta agatattaca 
tagtatctat tacaaaacac ctttcttgta 
ttactatgat atttatttta accaaaatgt 
gaatgtatta tgtttttaac ccacaaatgc 
tactgtaata tggacatctt ttgtgaaata 
acaaaaagat ttctgttatt agctttgaaa 
ataaaaataa aaatgaatac agtaaaatgt 

<210> 10 

<211> 2885 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> mis cofeature 

<223> Incyte ID No: 233678.1 

<400> 10 

aaatagtaag aaaataatgt acagttggta 
actagagcat tgctacaggt tgggcattgc 
tctcccatgg gagtgatgac cagtcccaga 
acaggccatt ccctcccttc tgaaagccca 
tgggagggag gccacggcct ctgatgcacc 
acaggtgctg gaaacatgca tgaagagctg 
gttccgcttt ctcaacgagc tcatcaaggt 
atcggagaag gtgaagaaca agatcttgga 
cgaggaggtg aaaatcgcag aggcctacca 
cgaccccaag cttccagatg acactacctt 
gatctttgaa gatgaggaga aatccaagat 
cgaagacctc cgcgcaccaa taagctcatc 
atggagaaga tctcgaagag gggtgaatgc 
ggctcacgga gatggtgatg agccacagcc 
cctcatgaag gaactgtacc agcgctgtga 
gagtgacaca gaggacaatg atgaggcctt 



gcttatacat tcaaaatgta actattaaat 1680 
acactgaaga gttttttgtt ttgttttaat 1740 
tataaatgaa tccatttaaa aagtggttaa 1800 
ttaaagttgc acattgccca aggctttttt 1860 
aaaaatattc tttgaataac cttgcagtac 1920 
cattttaact cataattgta cactataata 1980 
tgaagttctg atcggtcccc ttcagaaatt 2040 
atttatattg tatgtgcatt ttatccatta 2100 
cttttaaaag atatttggta taccaatact 2160 
tttaaaattt gggccactct gtatgcatat 2220 
gtgtgttata ctgtacctgt geiatgttgat 2280 
ttctagaata tgcttaataa aatgaaaact 2340 
attaacattt ctattgagaa gcttttgagt 2400 
gagatggtaa cacttcgtgt agcttaagga 2460 
cagatgtgtt ttccctgaat gctttcgtat 2520 
aagcctgaag gccaagagga agtcactgtt 2580 
atgaattatc ctgccaacgt gaaaacctca 2640 
taactgctgg ttttgttttt catatgtgtt 2700 
atttgtaaac ttaaaaaatg tataaagggc 2760 
taatcagtta tgtatggttt ctgaagggta 2820 
gttttgtttg tttttcctga gggctagatg 2880 
taacatttat actgagcatc catagatata 2940 
ttgaccatta atgtcatgtt cattttaatg 3000 
tggaacagat actctctttt tttttcttgc 3060 
attagcttga cccctcaaag taacttttaa 3120 
tatatctgct agaaggaaat agctgggaag 3180 
tatatgtgga aactttttgc ttcgaatatt 3240 
ttcctgaaga aaccactgtg taaaaatcaa 3300 
aactatgaag atgatttgaa gctctaattt 3360 
gtgttcataa gtaaatttta tattgattaa 3420 
taaaatgaaa gctatatcta ttctaaacct 3480 
gaaaatatgg agtaactttg ttttgtatgg 3540 
ctctatataa ataaagtgct caacaatgtg 3600 
gccatttcat gaatgcttta ccattcaaca 3660 
tccatatact tcaggtgttg ctgttaacat 3720 
tactcacatt aaatgtttat tctttaaaat 3780 
atacttaccc tgtgcctcat atttcaatag 3840 
cttttatttt gttatgcttt aaatatacat 3900 
attgtataat atcctaatat aaacaaaaat 3960 
c 3991 



ctcagatgtg gggcctaagc tatgaatggg 60 
tgggcactgt atgtaggtcg cctgacccta 120 
ggcctgtgtg actctgaatc ttccaacctc 180 
cccatgggct tctggcccct gtccaaggcg 240 
tgggcgtccc ctcccatccc ccctccctgc 300 
cggcaagcgg ttccacgacg aagtgggcaa 360 
cgtgtctccc aagtatctgg. gctctcggac 420 
gctcctctac agctggacag tgggcctgcc 480 
gatgctaaag aagcagggga ttgtaaagtc 540 
tccccttcct cctccacggc cgaagaatgt 600 
gctggcccgc ctgctgaaga gctcccatcc 660 
aaagagatgg tgcaggagga ccagaagcgg 720 
catcgaggag gtgaacaaca atgtgaaact 780 
agggcggcgc acagctggca gcagcgagga 840 
gcggatgcgg cccacgctct tccgactggc 900 
agcggagatc ctgcaggcca atgacaacct 960 
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cacccaggtg atcaacctgt ataagcagct ggtgcggggt gaggaggtca acggtgatgc 1020 
cacagccggc tccatccctg gggagcacct cggccctgct ggatctctca ggcctggatc 1080 
tcccgcctgc gggcaccacc tacccagcta tgcccacccg ccctggcgag caggccagcc 1140 
ctgagcagcc cagtgcctca gtttccctgc ttgacgacga gctcatgtct ctgggcctca 1200 
gtgaccccac acccccttca ggcccaagcc tggatggtac cggatggaac agcttccagt 1260 
cgtcggatgc cactgagccc ccagcccctg ctctgggccc aggcccccag tatggaaagc 1320 
cgacccccag cgcagacatc cctgccagca agcagcggtc tggacgacct agacctcctg 1380 
gggaagaccc tcctgcagca gtcgctgccc ccggaatccc agcaagtgcg gtgggagaag 1440 
cagcagccaa ccccccggct cacactccgg gacctgcaga ataagagcag cagctgcagc 1500 
tcccccagct ccagcgccac cagccttctc cacaccgtgt ccccagagcc ccccaggcct 1560 
ccgcagcagc ccgtaccaac cgagctctca ctggccagca tcactgtgcc cctggagtcc 1620 
atcaaaccca gcaacatcct gcccgtgact gtgtatgacc agcacggctt ccgcatcctc 1680 
ttccattttg cccgggaccc actgccaggg cgctccgacg tgctggtggt ggtggtttcc 1740 
atgctgagca ccgcccccca gcccatccgc aacatcgtgt tccagtcagc tgtccccaag 1800 
gttatgaagg tgaagctgca gccaccctcg ggcacggagc tgccagcttt taaaccccat 1860 
cgtccacccc tcagicaatca cccaggtcct gctgcttgcc aacccccaga aggagaaggt 1920 
tcgcctccgc tacaagctca ccttcaccat gggtgaccag acctacaacg agatggggga 1980 
tgtggaccag ttccccccac ctgaaacctg gggtagcctc tagaacagag gggctgggga 2040 
gaggaagggg cagagggacc ggtcactgtc cagcctggag ggaggcattg gtggccaagg 2100 
acaccctttg ttgcccatgg ccattcaccc ccaggcctgg tgcttctccc cacacccctg 2160 
taggcctcaa gtgactcttc cccctcctgc tccggccccg cccctgctga gccaaaccca 2220 
gtaggaggct ggggcctggg tttgtgccgc tggggtctcc atcaccggga cctggagagg 2280 
gaggggctgt gtagccttgg aagaacttgg gtcatgggga ggaagcacag ctgttgggga 2340 
agggccagga cctcaggccc agccccaacc ccagctgggg tggggtcttc cccacctgtc 2400 
tcttatgcct tatgggaagg cccagccata actcgggggc catgctggag ctggggacca 2460 
gcttaggcct cctccatagg aacccagtga ctggggggtg acgcctacac ccccagctat 2520 
ttgcactctg gtgtgtggtt tgactctgct tttcttccgg attggccctg tggtcacagc 2580 
ctcagggggc cagggctggg ggaacctcac ctggcccgta ctcctggggg tttccctttg 2640 
ccattgggcc ccctgaggga ctgtgggggc tcaagggtaa tgccagaggc ccatggcccc 2700 
agcgaggggc tgtggggcac ctagagttct cggtgtgtct ccttcattca ttggcctctg 2760 
ctggggcctc ctatgggtgt cttacgtctg tccatccatc tgtccgtggt cagaagtggg 2820 
gtcagtgtgt gagtgagagc aggagtattt atgaaaataa aacgtcgttt ttcctggaaa 2880 
aaaaa 2885 



<210> 11 

<211> 2458 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> mis cofeature 

<223> Incyte ID No: 312243.1 



<400> 11 

ggggcctcac aggccacgtg agcttctgag agctctgagc atgaggtcac gggctcccag 60 
aggcagtgat gtggcagcgc ctgtctcccg ccccccgtcc agctccgacg ccggcctgcg 120 
gttcccggac agcaacggcc tcctgcagac cccacgctgg gacgagccgc agcgggtgtg 180 
cgccctggag cagatttgcg gcgtgttccg cgtggatctg ggccacatgc gctccctccg 240 
ccttttcttc agcgacgagg cctgcaccag cggccagctg gtcgttgcca gccgagagag 300 
ccagtacaag gttttccact tccaccacgg cggcctggac aagctgtctg acgtgttcca 360 
gcagtggaaa tactgcaccg agatgcagct caaagaccag caggtcgccc ccgataagac 420 
atgcatgcag ttctccatcc gccgccccaa gctgccgtcc tccgagacgc accccgagga 480 
gagcatgtac aagaggctcg gcgtctccgc ctggctcaac cacctgaatg agctgggcca 540 
ggtggaggag gagtacaagc tgcggaaggc cattttcttt ggcggtattg atgtgtcaat 600 
ccgcggggag gtctggccct tcctgctgcg ctattacagc cacgagtcca cgtcggagga 660 
gcgggaggcg ctgcggctgc agaagcgaaa ggagtactct gagatccagc agaaaaggct 720 
ctccatgact cccgaggagc acagagcgtt ctggcgtaat gtgcagttca ctgtggacaa 780 
agacgtggtc cggacagatc ggaacaacca gttcttccgg ggggaagaca atcccaatgt 840 
ggagagcatg aggaggatcc tgctgaacta cgccgtgtac aaccctgccg tcggtattcc 900 
caagggatgt cggacctggt ggcgcccatc ttggccgagg tcctggatga gtcagacacc 960 
ttctggtgct ttgtgggttt gatgcagaac acgatcttcg tcagctcacc ccgggacgag 1020 
gacatggaga aacaactgct gtacctgcgc gagctgctgc ggctgacgca cgtgcgcttc 1080 
taccagcacc tggtctcgct gggcgaggac ggcctgcaga tgctcttctg ccaccgctgg 1140 
ctccttctgt gcttcaagcg ggagttcccc gaggccgaag cgctgcggat ctgggaggcc 1200 
tgctgggccc actaccagac ggactacttc caccttttca tctgcgtggc catcgtggcc 1260 
atctacgggg atgacgtcat cgagcagcag ctggccacgg accagatgct cctgcacttc 1320 
ggaaacctgg ccatgcacat gaacggggag ctcgttctcc ggaaggcgag gagtttgctg 1380 
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taccagttcc gcctcctgcc ccggatcccc tgcagcctgc acgatctgtg taagctgtgc 1440 
gggtcaggca tgtgggacag cggctccatg cccgcggtgg agtgcaccgg ccaccatccc 1500 
ggctcggaga gctgtcccta cgggggcacg gtggagatgc cttcccccaa gtccctgagg 1560 
gaaggcaaga agggcccaaa gacgccgcag gacggcttcg gcttccgcag ataggtcggg 1620 
cccccgacac cggacagggg ttgaggggac ctcctcagag gccctgggca cgggaggggg 1680 
tggggctggg cgtgaagggg acaggggacg atagaaacct aaggaaaatg cttttgggca 1740 
acatgagagg aaccttttca tattaatgac aaaattagag tctggaagtg acagaagtca 1800 
gatctacagc cacccagagg aaagtcagct cctgaaacgc tgcagtggaa cgcgcagcca 1860 
ccgcacctga gacgcaggct ggctgggctc tcctgctggc tgccctggag gatttcaaca 1920 
tgtcccagga tttgctccac cctcgagggc agccagacag cgtcgccagg caatgaggaa 1980 
agcagagaca ggagaggaag gcctcactca cccactgcgt cgagggctgc agaacacagc 2040 
ggggtcctgt ccaggcccag ggacatcttt gcaagccaga cacacttcct cttgagacct 2100 
cgttctctcg gagtgagcca ciacacacttc ccaaaacgtc cccagccaca gctgggatgc 2160 
cgatggaaag gcatctgcca taaaiagaaaa gcaaaagata aaaagcccaa ccgatgtggg 2220 
gatagagagg cggaagagca gtcaggcttg aggagctggc gcttgtaatg tttatccgtt 2280 
taaacatttc gtcctcctgg tacacgaagg gaactgtctg cccaggagcc tgagcctcag 2340 
gctgttggag aagcatctga tgcctttttc tttgctgggg gtcttctacg tgaggttcct 2400 
tggcgttgtt taaggtcaac tccaccaaat acagcaccca gctggggctt gaatggga 2458 

<210> 12 

<211> 748 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> inisc_f eature 

<223> Incyte ID No: 425487.3 

<220> 

<221> unsure 
<222> 635 

<223> a, t, c, g, or other 
<400> 12 

gcggccgcag cgtcggggct ggagcgatgg cggcgaccgc ggtggcggcg gctgtggcgg 60 
gaaccgagtc ggcccagggt cccccgggcc cggcagcgtc gctggagctg tggctgaaca 120 
aagccacaga cccaagcatg tcggaacagg attggtcagc tatccagaat ttctgtgagc 180 
aggtgaacac tgaccccaat ggccccacac atgcgccctg gctactggcc cacaagatcc 240 
agtctccgca agagaaggaa gctctttatg ccttaacggt gagtttggct tgccttgtag 300 
cctaaccttt cctgtccttg tgctatagag agccgagagg cgctttgctt ccacacagat 360 
ccttttcact ggagaaggga ggtcccctga gaggtttacc ttcacagcag gttggaggag 420 
ggagatctgg gccagggtcc ccacccttct tcctcctgtt tctcgtctgt ttcttcaggg 480 
cagcattgag tctactgtgg gagaaggaaa ggaaggtttt atttcatttg cccttcttag 540 
accagggctg gcaagttcat gtgccttttc tagagagggc agttctagca cagggcccac 600 
tagctgttgg caggtatgag tatgggctca gggtngctag tttaggtctt tgtatatgtt 660 
tgagtgttta gatgttggtg actaatttgc agtgttgaaa gccacagtat ggacgagatg 720 
aggggtctgc aggctcagct tgcctctt 748 

<210> 13 

<211> 1098 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> mi sc_ feature 

<223> Incyte ID No: 346813. 3. j 

<220> 

<221> unsure 

<222> 20, 47, 97, 106, 114, 941, 944, 947, 985, 1014, 1042 
<223> a, t, c, g, or other 

<400> 13 

cacaaaacca ttgatagtttn caacacccta acccagccac cttactnctg gtaacagaga 60 
gcccagttaa acataactgt ttagaggtgc tggactnagt ttattntagt aggnctaacc 120 
tccgagacca cccttaaaca tcagtagact gggagctgta cgtggatggg agcggctttg 180 
ccaacccctg caaagtgact ctgaagaagg agacaagccc tgctccagtc acacccagaa 240 



9/30 



wo 00/73509 



PCT/USOO/15404 



gctgactggt ccacgcacag gcgaagcatg aggaaactca ttgcgggact cattttcctt 300 

aaaatttgga cttgtacagt aaggacttca actgaccttc ctcagattga gaactgtttc 360 

cagtatatac atcaagtcac tgaggtagga caaaaattgc tacagtccta ttattttatg 420 

gttattataa gtgtaccagg actctaaaag aaacttgttt gtataatgct atccaaggta 480 

tgtagcccag ggaataacca acctgatgtg tgttatgacc cattttaagc ctcccgtgat 540 

cacagttttt aaaataaaat taaggactgg tccttttcta ggtgacacaa gtaaggtaat 600 

agctagaatg gaaaaaagag gggcccccaa aaatgtaacc ttaaaatttg gtgcttgtgc 660 

cgctattgat agtaagcagc atggaatagg atgcggttct ctaaattgga aaaaaaaagt 720 

gacacagtaa aaaaaaaata agtgtatctg tcaagaattg tatttatgtg agatgtgtca 780 

atactggtct tgtgtcattt gggctactta aaaagaagat aaaaaagatc ctgtttggcg 840 

gcttagttgt cttgggcaaa aactgtataa tggtaccaca aaaagttaca tggtggagtt 900 

ccaattacac agaaagaaat ccattcagta aatttccaaa nttncanact gtttgggccc 960 

acccagaact ccaccgggac tgganagccc ccaccgggtt atacgcagag cttntgctaa 1020 

gctccctgat cagtggacag gnagctgtgt aattggcacc attaagccat ctttcttctt 1080 
atgcccataa aaacaggt 1098 

<210> 14 

<211> 539 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 006861. l.j 

<400> 14 

tttgctgttt gatattggaa tacactctta aatcaatgtg gttatgccac acatcatttt 60 
aatgtgaatt tctcacttta tgcttttttg cgactgagtt attacttgcc gtttaagttt 120 
attttagact gtggaaaaga cgttagacaa aaagcaaatt cgagtgattt tcttattcga 180 
gttaagaatg ggttgtaaag cagtggagac aactcggcaa catcagcacc acattgggcc 240 
caggagccgc tgggaaccac taaacgagca tacagtgcag tgctgggtca agaagtttta 300 
caggctggga ttagaggcgt gagcactgtg ccctactgaa cgctgtttct agtgataact 360 
tgctgctcta tttttagcaa gtttactata tttatttact ttattcaacc agcaattact 420 
gagaatctgc tataggcagg ttctgggaac acaggaaaca aagcaaacaa cagaaataag 480 
aattggtctt ttcctggtag gaatttaaaa tctaatggga gcagcagcag tctctttcc 539 

<210> 15 

<211> 863 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 028008. 3. j 

<220> 

<221> unsure 
<222> 522 

<223> a, t, c, g, or other 
<400> 15 

gggatctggc ctttgtaccc ctaggctgga gttaggtccg gagtttttat cttcatcttt 60 
agtactgtac tgtttctcag gttcattgag gaggcactga gaacaaatgg gatatttttt 120 
ccatgatgag ggaaaaaacc ccaaccctga gagagattca gtgaggatag cgggagacgg 180 
aaaagttaga cctcagagca cttctctgcc agctcttcta ttcttttttc aagtcccttg 240 
tgggcaagga tgtggtcgtg gaactaaaga atgacctgag catctgtgga accctccatt 300 
ctgtggatca gtatctcaac atcaaactaa ctgacatcag tgtcacagac cctgagaaat 360 
accctcacat gttatcagtg aagaactgct tcattcgggg ctcagtggtc cgatacgtgc 420 
agctgccagc agatgaggtc gacacacagt tgctacagga tgcggcaagg aaggaagccc 480 
tgcagcagaa acagtgatgg ctcctcctcc tcttcccctc cntctttcat tggtgaccca 540 
taaccccaag tcccagccca gaacccctaa cccccaatac ttgaaggggt tttgtttttt 600 
tactaatgat ggttttgtgg gtttttttta agggatgagt ggatgagagg agtaataggg 660 
aacagctatc ctctcttgag aaggggagga taagtaggct gggaaacttc aaagccttcc 720 
cagtccccag cacctgcctt tctcactact tctctggaga tggtaggaga gtttcctagg 780 
tctttccagg gcagcatgtg attcatttgg ggatggaagg aatctgtccc gcatcgggaa 840 
tetaaatttat gatgcaaaaa aaa 863 
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<210> 16 

<211> 2600 

<212> DSNA 

<213> Homo sapiens 

<220> 

<221> misc^feature 

<223> Incyte ID No: 346078. 5. j 

<400> 16 

cctgaggact cagcgaaggg tgggcgccgc cgaggcctcc tgccgctggc gggtttccgc 60 
ggagtgccgc ccggctccgc tctgccgccg gcgcggctca tgggcagagt cggccgggcg 120 
ggccggcatt aaactgaaga aaagatgtcc ctgtacgatg acctaggagt ggagaccagt 180 
gactcaaaaa cagaaggctg gtccaaaaac ttcaaacttc tgcagtctca gcttcaggtg 240 
aagaaggcag ctctcactca ggcaaagagc caaaggacga aacaaagtac agtcctcgcc 300 
ccagtcattg acctgaagcg aggtggctcc tcagatgacc ggcaeiattgt ggacactcca 360 
ccgcatgtag cagctgggct gaaggatcct gttcccagtg ggttttctgc aggggaagtt 420 
ctgattccct tagctgacga atatgaccct atgtttccta atgattatga gaaagtagtg 480 
aagcgccaaa gagaggaacg acagagacag cgggagctgg aaagacaaaa ggaaatagaa 540 
gaaagggaaa aaaggcgtaa agacagacat gaagcaagtg ggtttgcaag gagaccagat 600 
ccagattctg atgaagatga agattatgag cgagagagga ggaaaagaag tatgggcgga 660 
gtggccattg ccccacccac ttctctggta gagaaagaca aagagttacc ccgagatttt 720 
ccttatgaag aggactcaag acctcgatca cagtcttcca aagcagccat tcctccccca 780 
gtgtacgagg aacaagacag accgagatct ccaaccggac ctagcaactc cttcctcgct 840 
aacatggggg gcacggtggc gcacaagatc atgcagaagt acggcttccg ggagggccag 900 
ggtctgggga agcatgagca gggcctgagc actgccttgt cagtggagaa gaccagcaag 960 
cgtggcggca agatcatcgt gggcgacgcc acagagaaag gtgtgtcccc agggaagcgt 1020 
gtgactagag ggaaaggact ggccccatcc atatcagaca tggccagtct tgatcctcat 1080 
gtgtcagcag ggggacaatg aggcgtgtgg ccagagggag agggctggcc ctgccatcac 1140 
tagaacacag gccgtcctgt tcatatgatg cactgccact tccgttttgt gaaaccagga 1200 
atcctgaggc tcatctttat tttttcagaa cagacgtaga gagatgaagg cttgtggagg 1260 
aaaagatggt gagagacttg ggcagaaaat gagtagtcct caggaagaaa tcttggttat 1320 
gtgtttagag catgaaggac agagccatat agtgtggcag tgaatatacc tgctatctcc 1380 
atctcagagg tcgtctctac ttttcccttt tgccctttca gtatagatgt gatttctgat 1440 
tctcttacag attgtttgct ttgcgagatc tgatgttatg ttgcagtctc ttggtaaatg 1500 
atgcctagtt ggtgttttat tttcatttaa tttttacagt ctgttctgtg ttgagggaat 1560 
tcaggaaaga gacaaacata tgttagcatt ttaatcaggg aattaagttt gagtcagcct 1620 
agctgaactt cctttgctaa agaaagaaga aaacttttct ggcagccccg ttcatgcaca 1680 
gcttaggata catcacgagc ctgacaggtg agtgccagaa accaacagtt gtcccgactt 1740 
gtgaggttat ctgaagtaag gcagccggtg gctggattag taactgcata ttcccctggg 1800 
cccgtgacct tgaacgtttg ctccaagtca actcacctat aggaattatc actcacatgc 1860 
cctgtcagcc ctttgggaag tgagatgagc aaaaattgca agtaatggtg gaggctcaaa 1920 
acatccagat gctattgtaa aaacatgcca aagcaaagca gaggctttta ttgcagataa 1980 
ggctgtgttt tcgctcagag accaattgtg tagatgccta ggacateiaat ggcggggatc 2040 
gctattgaaa ttaaattaat tattgtaagt aggactcagt tctgtaacac atctaatgat 2100 
atgctgctca gttctgtaac acatctaatg gtatgttttg atacagatgc atccaagaag 2160 
tcagattcaa atccgctgac tgaaatactt aagtgtccta ctaaagtggt cttactaagg 2220 
aacatggttg gtgcgggaga ggtggatgaa gacttggaag ttgaaaccaa ggaagaatgt 2280 
gaaaaatatg gcaaagttgg aaaatgtgtg atatttgaaa ttcctggtgc ccctgatgat 2340 
gaagcagtac ggatattttt agaatttgag agagttgaat cagcaattaa aggttagtgg 2400 
tacagctaaa tattaaagaa taaaaaagtt gaatttacag ccttaatctt . tacaagtaaa 2460 
gttactgttt aattaaagtt aaacctttat cttataggat acctgtatta actgtctttt 2520 
gtttgccttt cagcggttgt tgacttgaat gggaggtatt ttggtggacg ggtggtaaaa 2580 
gcatgtttct acaatttgga 2600 

<210> 17 

<211> 414 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 394637. l.j 

<400> 17 

tggttatgtt atacataatt ttaacttgca tttcttgctt tatgtttttt ttgctaatga 60 
cttactactt gctgtttatt ttatatttat tttagactat ggaaatgatg ttagacaaaa 120 
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agtaaattcg agagattttc ttatttaagc tcacaatggg tcgtaaaaca gcagagacaa 180 
cttgcaatat caacatgcat ttggcccagg aactgctaat gaacgtacag tgcagtggtg 240 
gttcaagaag ttttgcaaaa gataccagag ccttgaggat gaggagtgta gtggcaggcc 300 
actggaagtt gacaacaacc aattgagagc aatcattgaa gcagattgat cctcttacaa 360 
ttacatgaga agttgctgaa gaactcaacg taagaactca acgtcaacca ttcc 414 

<210> 18 

<211> 1307 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> mis cofeature 

<223> Incyte ID No: 222429.3 

<400> 18 

gggaaatcca gccctcccaa actcttgcct ttcttgtaag actttctaag cgcatggctt 60 
tgtcgtaaga cttctcgtgg ccatgtgtct caccctaccg ttcgtctcac caatgaacca 120 
gagctgggat accagcaaaa aggacccaaa agactgtttc accacaattg aagaggtcgt 180 
ggaccatccc ccaaaatgaa tcccaacact ggaagaccaa gagggttgac gtttgctatc 240 
ttgaaagctg cagcccccgt agagaagatc ctagagcaga aggagcacag gctgggctgc 300 
catagcatgg acccccaaaa agccactgtc atgaagaaat atcctgtgaa gaaaatcttt 360 
gcagggcacc tgaactccca aagccactga ggagaagaat tgcaaccttg gccaatggac 420 
ctgatcggtt ttggttatgc agccctcgtg acatttggaa gcatttttgg atataagcgg 480 
agaggtggtg ttccgtcttt gattgctggt ctttttgttg gatgtttggc cggctatgga 540 
gcttaccgtg tctccaatga caaacgagat gtaaaagtgt cactgtttac agctttcttc 600 
ctggctacca taatgggtgt gagatttaag aggtccaaga aaataatgcc tgctggtttg 660 
gttgcaggtt taagcctcat gatgatcctg agacttgtct tgttgctgct ctgagcatct 720 
ggaggaacag aaaactaagt tcatgtcatc ctgctgtaat gggcagagca tatttttttt 780 
gtatttaaaa gataaacttc aatatggaat gctagaaaca caaatagcac tgtcacctct 840 
aatatgaaca ttagtttgag gtagtttttt tctaaaacaa aaattttaac tgttttctaa 900 
ttgtcaagca ctattttcat taaaagtgtc taatgaatca tgatatactc ttccatttgt 960 
tgtgtctatt ttttatatat ttggtatttt ttgaaaattc caaatactca tgtctcaagt 1020 
aagcttaaac tacaacttgt cacataaagg aagtcttaag tggagttcac agaatgataa 1080 
tgtatctatt tgtcatttgt gttatatttg aaattattag aaattatgct ttttccattt 1140 
taattgtatt gctgccagtg ctattttttt ctttaaaaaa ttttattctt agcacactgt 1200 
tatgtcctaa ctgaatgtat tcagtattca aataaaagac aaatgacaaa tagatacatt 1260 
atcattctgt gaactccact taagacttcc tttatgtgac gtcgacg 1307 

<210> 19 

<211> 1406 

<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc_feature 

<223> Incyte ID No: 3 6673 9.2 

<220> 

<221> unsure 
<222> 311 

<223> a, t, c, g, or other 
<400> 19 

tgctgccagc gagagccgcg ggagagtgtg cagccgagtc actactgcct gcctgcctgc 60 
ctgctacggt gagtgtggcc cccacaatgg gatggcgcag ggcaggaggg ccatgggttc 120 
ccccacccca gactaagggg gcactagggg aggggccgag tcatgtgaag agggagaccc 180 
tctcagacag tcgaatgtgc tggtcccact aaggaaacca cctcaccctc tccaacttcc 240 
tgcctgaaaa tgggccctgg agctcgcaga cagggcagga ttgtgcaggg aaggcctgag 300 
atgtgcttct ncccaccccc taccccactc cctccccttc ggatcttaac actgggcact 360 
cacacaccca ccccatgctc ctctccaggc tcagcagcag gtactgtacc caaccatggg 420 
ctcgcaggcc ctgcccccgg ggcccatgca gaccctcatc tttttcgaca tggaggccac 480 
tggcttgccc ttctcccagc ccaaggtcac ggagctgtgc ctgcctgcct gtccacagat 540 
gtgccctgga gagccccccc acctctcagg ggccacctcc cacagttcct ccaccaccgc 600 
gtgtggtaga caagctctcc ctgtgtgtgg ctccggggaa ggcctgcagc cctggcagcc 660 
agcgagatca caggtctgag cacagctgtg ctggcagcgc atgggcgtca atgttttgat 720 
gacaacctgg ccaacctgct cctagccttc ctgcggcgcc agcaccagcc ctggtgcctg 780 
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gtggcacaca atggtgaccg 
ggcctcacca gtgctctgga 
ctggagcgag caagcagccc 
atctacactc gcctgtatgg 
ctggccctgc tcagcatctg 
cacgccaggc ctttcggcac 
aagccaagac catctgctgt 
cccagccttg gagagagcag 
ctatccaggg aggggctgct 
gccacactgt atggactatc 
acgaataaag acccccgctg 

<210> 20 
<211> 3028 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> mis cofeature 
<223> Incyte ID No: 474635.6 

<220> 

<221> unsure 

<222> 2063-2064, 2068, 2070, 2078-2079 
<223> a, t, c, g, or other 

<400> 20 

ccccagcccg ccatggcgtc aggcgccgcg gccccgggga ggtggctccc actttaagaa 60 
gtgaagtttt gctcccctcc ccctccctgc ccacctcctg cagcctcctg cgccccgccg 120 
agctggcgga tggagctgcg cacgggagcg tgggcagcca ggcggtggcg cggaggatgg 180 
atggggacag ccgagatggc ggcggcggca aggacgccac cgggtcggag gactacgaga 240 
acctgccgac tagcgcctcc gtgtccaccc acatgacagc aggagcgatg gccgggatcc 300 
tggagcactc ggtcatgtac ccggtggact cggtgaagac acgaatgcag agtttgagtc 360 
cagatcccaa agcc^cagtac acaagtatct acggagccct caagaaaatc atgcggaccg 420 
ciaggcttctg gaggcccttg cgaggcgtca acgtcatgat catgggtgca gggccagccc 480 
atgccatgta ttttgcctgc tatgaaaaca tgaaaaggac tttaaatgac gttttccacc 540 
accaaggaaa cagccaccta gccaacggta ttttgaaagc gtttgtctgg agttagaaag 600 
ttctcttctt caacacgtcc ctccccaggg tgttcctccc tgtgacccag ccgcctcgac 660 
ttcggcccgc ttgctcacga ataaagaact cagagttgtg tgtgcaatgc acacccagac 720 
acacgcacgc acacacacgc gcgcgcacac acatgctttt ttctgttccc ctccgctttc 780 
tgaagcctgg ggagaaatca gtgacagagg tgttttggtt ttattgttat gtgggttttc 840 
ttttgtattt tttttgtttg ttttgttttt aaacattcaa aagcaattaa tgatcagaca 900 
taggagaaac cctgaataga aacaaaactt ttgaatgctg gattcaaaaa aaaaaaaaag 960 
ttatctggac agcttctttg agactattta aaaactggta caacaggtct ctacaacgcc 1020 
aagatctaac taagctttaa aaggtcaaga agttttatgg ctgacaaagg actcgcgcaa 1080 
cgcagaaggc ctttcccacc ttaagcttcc ggggatctgg gaattttacc cccattctct 1140 
tctgtttgtc tgagtctcat ctctctgcaa gcaagggctg aaatcatttt gtttggttgt 1200 
tttgagggag agaggcgggg tgggggggtg caaatctgcc agcagctctt acgtaaggca 1260 
tgttttattg gggagggctg agcttttatt ttctcctctc cagtggggtt ggcttttatt 1320 
gtttcttgtt tgggtttgga atggaaatat ggatagcagc ataaagtact tttattttga 13 80 
caaaattcat ttttttcaac aatggagaca tagatttgac ccacaataac ttctccccct 1440 
ctctttttac tctgctcaaa aagcatctct cctcccatta cccaaccttg gtcataagtg 1500 
tgcctggctg gtttgcagat atttgttctg ctttgtaaaa attggccatt agtgcattta 1560 
ttgagatgat ctctaaagag ctatgccctg acctacccct gattctatga cattggggcc 1620 
cttcttttgc tgaaactgcc ttacgtaatg gttttactcc ttgaaagaga tttgacggaa 1680 
tccattttat gccaagtgct gccctgcact gtttctgcaa tatgtggtgt atgctgtggt 1740 
gatcttgctg ggaatgatta taagtgtgtg tgtggtgggg gagtgggtat tacatgcatt 1800 
gctgaagagt catcctggtg ttcctcattc ctcccacctt cccgtggtca ttttaattac 1860 
ggggcagtgt caccgcaaag ggaggaaact caaagccgaa agcaaaattc caggcctgat 1920 
tctggctttt gaggttcctg gttcttgaag ccaggcctga cccgactctc agatggggtc 1980 
agtcccgtcg ctttgcagac tgaccctgga aatctacaaa atgcagattt tcctgatttc 2040 
ctcttctctt gcccagtttt ttnntgtntn tttttttnna aaagcctgga ttgtaaccag 2100 
attttctttt ttcccccttc tcagctgtag atatgatatc tcctttcagg gccccagctt 2160 
aagggcaaag tgagttaatg tgtagacaaa ggcgagggac aagagagagt taacatctag 2220 
acagtggaaa aagccatggt gtgtggtttc tgggaaccac caacacttgc aggtttagct 2280 
ttttcccagg gttgactaca agaaagaaaa ccatgttttt gcaagattaa aatgtggttg 2340 
agtgtgccta aattaaccat ccccattttt atcatatttc caccatcact tcagggtttt 2400 



ctacgacttc cccctgctcc 
tggtgccttc tgtgtggata 
ctcagaacac ggcccaagga 
gcagtcccct ccagactcgc 
tcagtggaga ccacaggccc 
catcaggccc atgtatgggg 
cacaaccact gcacacctgg 
gggtaccaag gatcttcctc 
ggccccactg ggtctgctgg 
cctggccaca cctggggagt 
ccccat 



aagcagagct ggctatgctg 840 
gcatcactgc gctgaaggcc 900 
agagctacag cctaggcagc 960 
acacggctga gggtgatgtc 1020 
tgctgcggtg ggtggatgct 1080 
tcacagcctc tgctaggacc 1140 
ccacaaccag gaacactagt 1200 
cagtgaagga ccctggagcc 1260 
ccatcctgac cttggcagta 1320 
aggccaagaa ggaaaatctg 1380 

1406 
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aagagtcagt gctcacctgg gcggagctgg tagtacattt tgcttcttag aaagctaagt 2460 
cctgggttcc gtctgatttt aggttccagg aacttcctga gaacacccga tcgcagaggg 2520 
taattttctg gagtttgttt tgcagggata gctgggagta tggccaccct gctccacgat 2580 
gcggtaatga atccagcaga agtggtgaag cagcgcttgc agatgtacaa ctcgcagcac 2640 
cggtcagcaa tcagctgcat ccggacggtg tggaggaccg aggggttggg ggccttctac 2700 
cggagctaca ccacgcagct gaccatgaac atccccttcc agtccatcca cttcattttt 2760 
ttttgcaggg tgctgcctat gggccctctg ctccccaatg ccttagagag aggaggggag 2820 
ccacggccgc tcaccggaag gctgtgtgcg gggacatccg aggtggtggt ggacaggaag 2880 
gacttgggaa ggggagcgag aaattgcttt ttctcttcct ccttgggcag aatgtagctt 2940 
ttctgcttca ctgtggcagc ctcctccctg gatccttaga tcccagagga gggaagaaaa 3000 
tttgcagtga ctgaaaacag taaaaaaa 3028 

<210> 21 

<211> 537 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 228470. l.j 

<220> 

<221> unsure 

<222> iiO, 114, Iby, 163 
<223> a, t, c, g, or other 

<400> 21 

cggagaagtc atactctctc acaccctcgg ctttcttgtt gtgtccttca gcaaaacagt 60 
ggatttaaat ctccttgcac aagcttgaga gcaacacaat ctatcaggan aganagaaag 120 
aaaaaaaccg aacctgacaa aaaagaagaa aaagaagang aanaaaaatc atgaaaacca 180 
tccagccaaa aatgcacaat tctatctctt gggcaatctt cacggggctg gctgctctgt 240 
gtctcttcca aggagtgccc gtgcgcaggg aagatgccac cttccccaaa gctatggaca 300 
acgtgacggt ccggcagggg gagagcgcca ccctcaggtg cactattgac aaccgggtca 360 
cccgggtggc ctggctaaac cgcagcacca tcctctatgc tgggaatgac aagtggtgcc 420 
tggatcctcg cgtggtcctt ctgagcaaca cccaaacgca ttacagcatc gagatccaga 480 
acgtggatgt gtatgacgag ggcccttaca cctgctcggt gcagacagac aaccacc 537 

<210> 22 

<211> 3080 

<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc_fGature 

<223> Incyte ID No: 407090. 5. j 

<220> 

<221> unsure 

<222> 2926, 2938, 2941, 3008, 3052 
<223> a, t, c, g, or other 

<400> 22 

gcggcgctgg ctttaggtga acgacgtggt gaggagtggg tttcgggcat gagaagtcac 60 
agggccgttt cctagtctct cttcacttct ttgggtcttc tcagagaaag aaggctgccg 120 
tgggtaggct gggggcggag actatcggga agagaaaatt acttttccca ctgaaacaca 180 
cccaagtata tgcccagcct tcatgaaagt gaacagagaa acgaagcgcc tttatgtggg 240 
tggccttagc caggacattt ctgaggcaga cctacaaaat cagttcagca gatttggaga 300 
agtttcggat gtggagatca tcacacggaa agatgaccaa ggaaacccac agaaagtttt 3 60 
tgcatatatc aacatcagtg tagcagaagc ggacctgaaa aaatgtatgt ctgttttaaa 420 
taaaacaaaa tggaaaggtg gaacattaca aattcaacta gcaaaagaaa gctttctgca 480 
cagattggcc caagagagag aagcagcaaa agctaagaaa gaagaatcaa caacaggtaa 540 
cgccaacttg ttagaaaaga caggaggagt ggatttccat atgaaagctg tgccagggac 600 
agaagtgcca gggcataaga attgggttgt gagcaaattt ggaagagtct tacctgttct 660 
tcaccttaaa aatcaacata aacgtaaaat catcaaatat gatccctcaa aatactgcca 720 
caacctgaag aagatagggg aggatttctc aaacaccatt cctatatcca gcctgacttg 780 
ggaattagaa ggagggaatg accctatgag taagaaacgg cgaggagagt tctctgactt 840 
tcatggccct cccaagaaga taataaaagt gcagaaggat gagagttccc actgggtctc 900 
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tggccatgag tacaaggccc aggagggtaa tagagagacc acccttaaca cagcaacagg 960 
ctgcacEiaaa aagaacttgt gattccatta ctccttctaa atcatctcct gtacctgttt 1020 
ctgatactca gaaacttaaa aatctacctt ttaagacttc tggcttggaa actgccaaga 1080 
agagaaacag catttctgat gatgatactg attctgaaga tgaattgaga atgatgattg 1140 
cgaaagagga eiaacttacag agaactacac aaccctcaat aaatgaatct gaaagtgatc 1200 
cttttgaagt tgtaagggat gatttcaaat caggcgttca caaactgcat tctttaatag 1260 
gtttaggtat caaaaatcgt gtctcttgcc atgatagtga tgatgatatt atgagaaatg 1320 
atcgtgagta tgactcagga gatacagatg aaattattgc gatgaaaaaa aatgttgcta 1380 
aggtcaaaaa cagtacagaa ttttcacaaa tggaaaaatc tacgaagaaa acttctttca 1440 
aaaatagaga aaactgtgag ctttctgatc actgtattaa actacaaaaa agaaaaagca 1500 
atgtagagtc agccctcagt catggattaa agtctcttaa tcgtaaatct ccctctcact 1560 
ccagtaggca gtgaagatgc tgattctgca tcagaattag ctgactctga aggaggtgag 1620 
gagtataatg ccatgatgaa aaactgcctt cgtgtgaatc tcactttagc tgatttggaa 1680 
caattggctg gcagtgatct gaaggttcca aatgaagata ctaagagtga tggaccagaa 1740 
accaccaccc aatgcaagtt tgacagatgc tccaagagcc ccaagactcc cactggcctc 1800 
cgcagaggcc gacagtgtat tcgtcctgcg gagattgtgg cttccctgtt agaaggagag 1860 
gagaacacct gtggcaaaca gaaaccaaag gaaaacaatt taaagccaaa atttcaggct 1920 
ttcaagggag taggctgtct atatgaaaag gagtcaatga aaaaatcctt gaaagacagt 1980 
gttgcctcta acaataaaga tcagaattcc atgaaacatg aggatcccag tatcatatcc 2040 
atggaagatg ggtccccata tgttaatggc tcattaggtg aagtgactcc atgccaacat 2100 
gcaaagaagg cgaatggccc aaactatatt cagcctcaaa aaagacagac cacttttgaa 2160 
agccaggatc gcaaggcagt gtcccctagc agttctgaaa agagaagtaa gaatcctatt 2220 
tctaggccat tagaaggtaa gaagtcctta agtcttagtg caaagactca caacatg.ggc 2280 
cttgacaaag acagctgcca tagtaccaca aagacagaag cttcacagga agagcggtct 2340 
gattcaagcg gcctcacatc tctcaagaaa tcaccaaagg tctcatccaa ggacactcgg 2400 
gaaatcaaaa ctgatttctc actttctatt agtaattcgt cagatgtgag tgctaaagat 2460 
aagcatgctg aagacaatga gaagcgtttg gcagccttgg aagcgaggca aaaagcaaaa 2520 
gaagtgcaga agaagctggt gcataatgct ctggcaaatt tggatggtca tccagaggat 2580 
aagccaacgc acatcatctt cggttctgac agtgaatgtg aaacagagga gacatcgact 2640 
caggagcaga gccatccagg agaggaatgg gtgaaagagt ctatgggtaa aacatcaggg 2700 
aagctgtttg atagcagtga tgatgacgaa tctgattctg aagatgacag taataggttc 2760 
aaaattaaac ctcagtttga gggcagagct ggacagaagc tcatggattt acagtcgcac 2820 
tttggcaccg atgacagatt ccgcatggac tctcgatttc tagaaactga cagtgaagag 2880 
gaacaggaag aggtaaatga aaagaaaact gctgaggaag aagagnttgc tgaagaanaa 2940 
nagaaagccc tgaatgttgt acaaagtgtt ttgcaaatca acttaagcaa ttctacaaac 3000 
agaggatnag tagctgctaa gaaatttaag gacatcatac attatgatcc ancgaagcaa 3060 
gaccatgcca cttacgaaag 3080 

<210> 23 

<211> 426 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 068194. l.j 

<220> 

<221> unsure 

<222> 328, 403 

<223> a, t, c, g, or other 

<400> 23 

cccaagtccc tagaagaagc caccccatcc aaggagggtg acatcctaaa gcctgaagaa 60 

gaaacaatgg agttcccgga gggggacaag gtgaaagtga tcctgagcaa ggaggacttt 120 

gaggcatcac tgaaggaggc cggggagagg ctggtggctg tggacttctc ggccacgtgg 180 

tgtgggccct gcaggaccat cagaccattc ttccatgccc tgtctgtgaa gcatgaggat 240 

gtggtgttcc tggaggtgga cgctgacaac tgtgaggagg tggtgagaga gtgtgccatc 300 

atgtgtgtcc caacctttca gttttatnaa aaagaagaaa aggtggatga actttgcggc 360 

gcccttaagg aaaaacttga agcagtcatt gcagaattaa agntaaacat gtattctgaa 420 
aacaaa 426 

<210> 24 

<211> 3219 

<212> DNA 

<213> Homo sapiens 
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<220> 

<221> mis cofeature 

<223> Incyte ID No: 411449.2 

<220> 

<221> unsure 
<222> 3088 

<223> a, t, c, g, or other 
<400> 24 

ggaaagggaa acaaccacat aaacgcaaga aaaaatccag gaaaaagtct ctcaaaaciac 60 
ctgctttatt cttagaggca gaaagtaaca cttcacattc agatgattca gcatccagca 120 
gttctgagga aagtgaggaa agagacacta agaaaaccaa aaggaaaaag agagagaaaa 180 
aagcccatac ctctgtagcc aacaatgaaa tacaggagag gacaaacaaa cgcacaaatt 240 
ggaaagtagc tacagatgaa aggtctgctg agagctcaga ggatgactaa atgggaaaca 300 
cttttgtttt ccacatgact gtggatattt acagttctta ctccttgtgg ttttgccagt 360 
gactcttgtt cagcacgggg cctgaggtca gagctgtctt gtgccatctg tatgttctga 420 
cagacgtctt gtcttctatt ttggcgttaa gcttgatccc cttttcttgt taaaagggaa 480 
tctggtattt tgttatgaag gtttcttgaa gagatttttt tttttgcaat taattacgtt 540 
tagtgtagag tgcatataca gcaaattaaa ggacccagaa agctggatcc aatagtgacc 600 
tgggtacacc aatcggaata ttgaatttgg ggaagtcaag ggctgggatc caagaggtgg 660 
attggaacta atgccatgta ggatggtatg acaaggcaac actgtattgc tctctgttta 720 
tatagcaggt gtcacaacta acttgtcttt agccttggtg ctttgatcct tctatatttt 780 
gaccccacag gtgtggtccg gtttacttaa tcaggacatg ggcctaagaa caaacctttt 840 
cccttcatga taacatccat agacaactta ttagaaggga ctagagtttt tgcaaatttc 900 
cctgctggat ggggcctata gctatactta gtatatgcct aaacatggta attggatagt 960 
aaatggtttt ctagttccat tgctgtatat ttgcctaaat ggacttgtgt tcaaattatt 1020 
tcttcaattg tcatagataa tcctgtacca aatggggaag aattaggaaa taatcatgtt 1080 
gtctaatggt actctggatt cagggcagca actgccattt aaatgttgtc ttgttcattt 1140 
ctaaatctgt tcatgaagtt taggttttcc ctgaaactaa gttgaattat ttccaaaatg 1200 
aaacaggctt ctcagggaca tatccacttc ttcccagtct gcctttggat taaagcacca 1260 
agcagagacc acattaattc cctttgctat actgtgatcc ttagtatgtt aattcttaag 1320 
aaaccaacat atcactgaaa gaaggctggc agaacgcaag tgcatttttt cactgtggga 1380 
agaaagatca agtgacgtat tattttttcc tggttgtcac ttaatgggct gagtaaaaag 1440 
cttgaaaact cagactttcg gtcttggttc tgccactcat tggttatgag gaggcccaga 1500 
gcaggtaagt tcaccttcct ggccttactt tcctgatgtg taatacggaa ttacttcaca 1560 
gtagcatgac agtataagac accagcagta gatacaacta tgatgacatt ccatgagttg 1620 
gtatttttag ttctaactgc taaatttgtt ctctttacgg gacagatttc taataaagtg 1680 
cttggtctta aaatacatgg ttggacagag gtgccctatc ccttaactat gagcaggtgc 1740 
taccttttgg gatatttatt ttaaatttta atactttggt actcaattgt cagtgttcca 1800 
tggtgtgtat ttttattttt gggattagtg ggggtctaaa gggagaagaa tagtctctaa 1860 
ttactacctc ttaacctaaa gcaattattt tgttcctgga gcaagttaaa tctttgttgg 1920 
aaggagcttt ggccatatat tttttagcat gcattgtttc tgtgccctga aagtacctga 1980 
aaggttttaa gcacagactc aggaaaatgt gccagtagaa caggccatct ccaggaaatt 2040 
ggctctattt gggtcctgac cttcccttcc tcccaagtta gcaggcttgt tcttttgcaa 2100 
ggaatacaca tcttgccttt tttttttttt ttgccatgtt ttccttttct tggtcatgta 2160 
taagcaataa agctgttttt tgttcttcat ctttcttaac cccaaatttt cttctatgcc 2220 
ttaggcttcg atggttcttc caaccccctt aatatggctt agggtggttt ttcaaaacct 2280 
acaatccccc atttgcacta ctggccatgg aacatttatt tctagtgttc ctgccaatca 2340 
gagatctcta tattaaattc taaaatggga ttaaaagaag agttggagaa ttcacactta 2400 
ttgagtaact gatgtcatac aacctggaat ttctgaattc caaataaata aatttcactc 2460 
tttgaacatt tcatctttta ctttttagca ccaacagact tgataacagc ctgatgctga 2520 
tctgacaatg ggttgatagc cttcccccac tgacccttaa atctgcttag taacaagtcc 2580 
tttgcttctg tcattctcct gggggatggc ctactgccct cctttctgta caatctgggc 2640 
aaaccgactg gtgatggcaa gagtggtgtc aatgaagcgg tctacacagc tggagagaca 2700 
attttcagtg cgagagtcta ggcgattccc tggcttctcc acacatttat cccaacataa 2760 
ctccatgaag tgatgcacta tttgtgaggc tgggtactaa gaaatcagtt ctctctaagg 2820 
tcctcaagga tagctgtcat cacctcccat ttaagaggcg tgattatgta gtccaaggtc 2880 
atgtagccag caagaagtca ggccgcgtta gaaccatgtc cgaagggctc caaacccttg 2940 
ttctacatcc atagtctaca gcgactactt cagagtccac cttcctccga taatgttcta 3000 
gtcgttttca aatacattgt cgcatatgat cctcacaatc cagtgaggca gtggggtgac 3060 
ggcgaaacca aaacccacag tggaatgnag tatcttttct acgggcacgt ggcctgtgca 3120 
gtaaactgcg ccttctgctg ctcggcggcc accaggcgct gcaactccgc ttcatcggct 3180 
tcgcccagct ccgccattgt tcgcctgcag gctcgccac 3219 

<210> 25 
<211> 445 
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<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc^feature 

<223> Incyte ID No: 018549.2 

<220> 

<221> unsure 
<222> 34, 418 

<223> a, t, c, g, or other 
<400> 25 

tgacttttgg gagagctgac cttttgtgac tttngggaga gctgccaaaa gtgaaactta 60 
gtgcctcaga caagcagggg caagtctgct aaggaagctg tggccagaag cacagatcag 120 
aaacacgatg gctctgttaa cagccgaaac attccgctta cagtttaaca acaagcgcct 180 
cctcagaagg ccttactacc cgaggaaggc cctcttgtgt taccagctga cgccgcagaa 240 
tggctccacg cccaccagag gctactttga aaacaagaaa aagtgccatg cagaaatttg 300 
ctttattaac gagatcaagt ccatgggact ggacgaaacg catgcttacc aagtcacctg 360 
ttacctcacg tggagcccct gctcctcctg tgcctgggag ctggttgact tcatcaangc 420 
tcacgaccat ctgaacctgg gcatc 445 

<210> 26 
<211> 1657 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> inisc_feature 

<223> Incyte ID No: 236043.3 

<220> 

<221> unsure 
<222> 5 

<223> a, t, c, g, or other 
<400> 26 

ccggnccggg ctcctcctgc tgctgggaca gtgctctagt agaacagaca gacctactga 60 
cacaggggag gtgagaaggg aggtgaccac caggactggc tctgtgagta ccacacagtg 120 
gggagggggt gggggccacc atgtcatcat atcagaagga actggagaaa tacagagaca 180 
tagatgaaga tgagatccta aggaccttga gccccgagga gctagagcag ctggactgcg 240 
aactacagga gatggatcct gagaacatgc tcctgccagc tggactaaga caacgtgacc 300 
agacaaagaa gagcccaacg gggccactgg accgagaggc ccttttgcag tacttggagc 360 
aacaggcact agaagtcaaa gagcgtgatg acttggtgcc cttcacaggc gagaagaagg 420 
ggaaacccta tattcagccc aagagggaaa tcccagcaga ggagcagatc accctggagc 480 
ctgagctgga ggaggcactg gcacatgcca cagatgctga aatgtgtgac attgcagcaa 540 
ttctggacat gtacacactg atgagtaaca agcaatacta tgatgccctc tgcagtggag 600 
aaatctgcaa cactgaaggc attagcagtg tggtacagcc tgacaagtat aagccagtgc 660 
cggatgaacc cccaaatccc acaaacattg aggagatact aaagagggtc cgaagcaatg 720 
acaaggagct ggaggaggtg aacttgaata atatacagga catcccaata cccatgctaa 780 
gtgagctgtg tgaggcaatg aaggcaaata cctatgtgcg gagcttcagt ctggtagcca 840 
cgaggagtgg tgaccccatt gccaatgcag tggctgacat gttgcgtgag aatcgtagcc 900 
tccagagcct aaacatcgaa tccaacttca ttagcagcac aggactcatg gctgtgctga 960 
aggcagttcg ggaaaatgcc acactcactg agctccgtgt agacaatcag cgccagtggc 1020 
ctggtgatgc agtggagatg gagatggcca ccgtgctaga gcagtgtccc tctattgtcc 1080 
gctttggcta ccactttaca cagcaggggc cacgagctcg ggcagcccag gccatgaccc 1140 
gaaacaatga actacgtgag taactgcaga catgatgtgt ggagtggtca gggaagtgca 1200 
gaaattggtg gatcctcctg aaagcagacc taatgactaa cagcccaggg tgctaccaaa 1260 
gage tat tea tatgagaatt tcaaatccca agagtgatca gaaagtaaac agaaaactcc 1320 
ccctgcccca aatatggacc agtagagagg tagaaaggat gccagagata atcatacttg 1380 
tttagaggta tcgtaatttt atttgtggtg tggtttcttg ttttgttttg ttaatgtggg 1440 
atgggcctgt aaggtggcct aagacactac caatttatga gtttggctaa gggactggaa 1500 
aggagaagag tgcctttgca gaaagcagga gctggaacaa acacttatgt ttaatattgc 1560 
tcctttacag gtcgccagca aaagaagaga taacactgca tttcccttta ccaactagcg 1620 
ctgggagcac tggtcactta aatcctcatc tgtcctc 1657 

<210> 27 
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<211> 1041 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 445433.2 

<400> 27 

ttcttcttta attagaagct taaagagaag tttgcagaat gtactcataa gtggatggga 60 
taatactgtt aagttctgat attctgatat tgtttgaaat actgttaaga atttcacatt 120 
tggtaagtat tttttatatc agtattaaaa tagtaatttg gtttattaca attttataca 180 
tagaatttgc cagttacttt ctgactacaa agaaaaacag atactaaaag tctcttctga 240 
aaacagcaat ccagaacaag acttaaagct gacatcagag gaagagtcac aaaggcttaa 300 
aggaagtgaa aatagccagc cagaggaaat gtctcaagaa ccagaaataa ataagggtgg 360 
tgatagaaag gttgaagaag aaatgaagaa gcacggaagt actcatatgg gattcccaga 420 
aaacctgcct aacggtgcca ctgctgacaa tggtgatgat ggattaattc caccaaggaa 480 
aagcagaaca cctgaaagcc agcaatttcc tgacactgag aatgaacagt atcacaggga 540 
cttttctggc catcccaact ttcccacgac ccttcccatc aaacagtgat gaacaaaatg 600 
atactcagaa gcaactttct gaagaacaga acactggaat attacaagat gagattctga 660 
ttcatgaaga aaagcagata gaagtggctg aaaatgaatt ctgagctttc tcttagttat 720 
aagaaagaaa aagacctctt gcatgaaaat agtacgttgc aggaagaaat tgtcatgcta 780 
agactggaac tagacataat gaaacatcag agccagctaa gagaaaagaa atattCggag 840 
gaaattgaaa gtgtggaaaa aaagaatgat gatcttttaa agggtctaca actgaatgag 900 
ctcaccatgg gatgatgata ctgccgtgct cgtcattgac aacggctctg gcacgtgcaa 960 
ggccggcttt gcaggtgacg atgccccccg ggctgtcttc ccttccatcg tgggggtgcc 1020 
cccaggcacc agagcatgat g 1041 

<210> 28 

<211> 2113 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 344630. 7. j 

<400> 28 

gcagtgcggc cgtcatggcg tcgcccttca gcggggcgct gcagctgacg gacctggatg 60 
acttcatcgg gccgtctcag gagtgcatca agcctgtcaa agtggaaaaa agggcgggaa 120 
gtggcgttgg ccaagattcg cattgaagat gacgggagct acttccaaat taaccaagac 180 
ggcgggaccc ggaggctgga gaaggccaag gtctcgctaa acgactgcct ggcgtgcagc 240 
ggctgcatca cctccgcaga gaccgtgctt atcacccagc agagccacga ggagctgaag 300 
aaggttctag atgctaacaa gatggcggca cccagtcagc agaggctggt tgtagtttcg 360 
gttctcacca cagtctagag catcgctggc tgcacggttt cagctgaatc ctacagatac 420 
tgccaggaaa ttaacctcat tctttaaaaa aataggggtg cacttcgtct tcgacaccgc 480 
cttctcaagg cacttcagcc tcctgggaga gccagcgaga gtttgtgcgg cgattccgag 540 
gacaggccga ctgcagacag gcgctgcccc tgctggcctc tgcctgccca ggctggatct 600 
gctatgccga gaagactcac ggcagcttca tcctccccca catcagcacc gcccggtccc 660 
cgcagcaggt catgggctcc ctggtcaagg acttcttcgc ccagcagcag cacttgaccc 720 
ctgacaagat ctaccacgtc acagtgatgc cctgctatga caaaaagctg gaagcctcca 780 
gacccgactt tttcaaccag gagcaccaga cacgggatgt ggactgtgtc ctcacaacag 840 
gagaagtttt caggttgctg gaggaagagg gcgtctccct ccccgacctg gaaccagccc 900 
ctctggacag cctgtgcagc ggtgcctctg cagaggagcc caccagccat cggggagggg 960 
gctcgggggg ctacctggag cacgtgttcc ggcacgcggc ccgagagctc tttggaatcc 1020 
atgtggctga ggttacctac aaacccctga ggaacaaaga cttccaggag gtgacactgg 1080 
agaaggaggg ccaggtgctg ctgcacttcg caatggcgta cggcttccgc aacatccaga 1140 
acctggtgca gaggctcaaa cgagggcgct gcccctacca ctacgtggag gtcatggcct 1200 
gcccctcagg ctgcctgaac ggcgggggcc agctccaggc cccagacagg cccagcagag 1260 
agctcctcca gcacgtggag agactgtacg gcatggtccg ggctgaggcg cccgaggacg 1320 
cgcctggggt tcaggagctg tacacacact ggctgcaggg cacggactcg gagtgtgcag 1380 
gtcgcttgct gcatacgcag taccacgccg tggagaaggc cagcactggc ctggtgcatc 1440 
cggtggtagg ggctgcagga ccaggactcc caggaggccg tgtccatgtg tgacagcaga 1500 
accacatgcc ccaagacccc agggcttccc ccaaaattct gagtgagctg cagggtgtgc 1560 
tgggacccga gtaggagcta ggactagcca ggacccgcag ccgcctcgtc acctccagtt 1620 
gggtgcctct gggttcccac tggctctgcc caggtggggt ggggtggccc aggcagcaga 1680 
aggttccctg aggtcccaga gcctgttccg ttggccctgg gccgaggccc acaggtgctg 1740 
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cccttgctgc tgctggtcgg gcacccaagt gcgtgagggg cttcagcctg tcccggggtt 1800 

gcctgaggca gagcaagacg ggttctcacc cctgacttct ggaggcttcc cttgaagctc 1860 

tgtgcaaaag gtgggagaca gagctggacc tgcaggggtg gtcccgccac aaccctgcgt 1920 

gtggaccctg gcaggggggg gtgccaggcc cctggaaagc aggggttacc gttacgaggc 1980 

tgtggtccgg ggcaagccaa gtacgaagca gcagccatcg cgggctgcat catcccccag 2040 

ccaggtcccc accaggcctg tctcccagcg tttgtctaat aaacgcaccc ctcctaaciac 2100 

acgcctacga aaa. 2113 

<210> 29 

<211> 3813 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 257121.2 

<220> 

<221> iinsure 
<222> 1365 

<223> a, t, c, g, or other 
<400> 29 

aaatcaaggg agctaaatgg cggggtggat ggaattgcaa gtattgaaag tatacattct 60 
gaaatgtgta ctgataagaa ctccattttc tctacaaata cctcttctga caatggatta 120 
acttccatca gcaaacaaat tggagacttc atagagtgcc ctttgtgcct tttgcggcat 180 
tctaaagaca gatttcctga tataatgact tgtcatcaca gatcttgtgt ggattgctta 240 
ccgacaatat ttaaggatag aaatctctga aagcagagtt aatattagtt gcccagaatg 3 00 
tactgaacgg tttaatcccc atgatattcg cttgatatta agtgatgatg tcttgatgga 3 60 
aaaatacgaa gaatttatgc ttagacggtg gcttgttgca gatcctgatt gtaggtggtg 420 
tccagctcca gactgtggat atgctgtgat agcatttgga tgtgccagct gtccaaaatt 480 
aacttgtggg cgagagggct gtggaacaga gttttgctac cactgtaaac agatttggca 540 
ccccaaccag acctgtgatg ctgctcgaca agagagagcc cagagcttac gtttgagaac 600 
tatacgttct tcatccatta gttatagtca agagtctgga gcagcagctg atgatataaa 660 
gccatgtcca cgatgtgctg cttatataat aaagatgaat gatgggagct gcaatcacat 720 
gacatgtgct gtttgtggtt gtgagttttg ttggttgtgt atgaaagaaa tctcagattt 780 
gcattatcta agtccatcag gatgtacttt ttgggggaag aaaccctgga gccgaaagaa 840 
gaaaatattg tggcaactgg gaacactggt tggtgctcct gtcggaatcg ctttaatagc 900 
tggcattgct attcctgcaa tgattattgg cattcctgtg tatgtgggcc gcaagattca 960 
caatcgctat gaaggcaagg atgtttcaaa gcacaaacgg aatttggcca tagcaggtgg 1020 
tgtaacgttg tctgtaatcg tgtctccagt agtagctgca gtgactgtag gtatcggtgt 1080 
tcctattatg ttagcttatg tctatggcgt agttccaatt tctctttgtc gaagcggagg 1140 
ttgtggagtc tcagcaggca atggaaaagg agttaggatt gaatttgatg atgaaaatga 1200 
tataaatgtt ggtggaacta acacagctgt agacacaaca tcagtagcag aagcaagaca 1260 
caacccaagc ataggggagg gaagtgttgg tgggctgact ggcagtttga gtgcaagtgg 1320 
aagccacatg gatcgaatag gagccatccg agacaacctg agtgnaacgg ccagcaccat 1380 
ggcactagct ggagccagta taacggggag tctgtcagga agtgccatgg taaactgttt 1440 
taacaggttg gaagtacaag cagatgtaca gaaagaacgg tacagtctaa gtggagaatc 1500 
tggcacagtc agcttgggaa cagttagtga taatgccagc accaaagcaa tggcaggatc 1560 
cattctgaat tcctacatcc cattggacaa agaaggcaac agtatggagg tgcaagtaga 1620 
tattgagtca aagccatcca aattcaggca caacagtgga agcagtagtg tggatgatgg 1680 
cagtgccacc cgaagtcatg ctggcggttc atccagtggc ttgcctgaag gtaaatctag 1740 
tgccaccaag tggtccaaag aagcaacagc agggaaaaaa tcaaaaagtg gtaaactgag 1800 
gaaaaagggt aacatgaaga taaatgagac gagagaggac atggatgcac agttgttaga 1860 
acaacaaagc acgaactcaa gtgaatttga ggctccatcc ctcagtgaca gtatgccttc 1920 
tgtagcagat tctcactcta gtcatttttc tgaatttagt tgttctgacc tagaaagcat 1980 
gaaaacttct tgtagtcatg gttccagtga ttatcacacc cgctttgcta ctgttaacat 2040 
tcttcctgag gtagaaaatg accgtctgga aaattcccca catcagtgta gcatttctgt 2100 
ggttacccaa actgcttcct gttcagaagt ttcacagttg aatcatattg ctgaagaaca 2160 
tggtaacaat ggaataaaac ctaatgttga tttatatttt ggcgatgcac taaaagaaac 2220 
aaataacaac cactcacatc agacaatgga attaaaagtt gcaattcaga ctgaaattta 2280 
ggcccataaa tgctgcagaa taattaccac tgtacaaccg tgtttggagc tggttgaact 2340 
acatgtgact acttaagttt caggttacca gcaaaagccg ggtttcatta tcataatgca 2400 
gatacatttt ctgtgttcag caaggcattg tgtgtcatgt gggatcttag ttaccaaact 2460 
atgaagtgaa ggctttaaaa gtgcattatt ttaaggataa taaatttgaa gagcaaagca 2520 
tgttttgtgt gtttgccaca aaacattgct tgaagcacat acttagatag aaattggtct 2580 
taatttatat aatcaatata aaatactaat gcaattctac agcattcaaa tgaagaaaac 2640 
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ttgaggcttt agggataagt ggttagtgat attttattiga aaccactaaa gagataagtt 2700 
taaaagaact gcataggtta ctctcagtat atgatactct gtaacatttc tatttatatc 2760 
ggcataaatt tcattttttt tcttcatatg caatgtggtt atataaagct taatgcagct 2820 
catttgctac catttggata cttagacact ttgagcaaga ttgtggcagt ttttgcacaa 2880 
ctttgaaata gaaatacctg gtactctatc ttgtttattg ttgatgccat cttagaggaa 2940 
aaaatgtaaa ggtaagtaat taagcatatg acagcaacaa ataagataca taaaactaca 3000 
aaataaagtc ccattaggtt ataagtatta caaaaaatcc acctttctct aaggggaagt 3060 
ttgtacccca ttgattcttg gtgcctttgg gatcgactgg gttttaatgg cctagttatt 3120 
tgaggatttt gctgtgttgt tttccatgtc ttctctggtc accttggatt atatataaaa 3180 
atacaggaaa tagataaaca tgaatgtgat taataatgct gaaaaagtat tagcctacca 3240 
aagacacact caggctttag tgaataactt tacataacct cagtttttaa cacatgcata 33 00 
tcttctccaa ccatgaaatc aaagcacggt gcagaacttg taccaagtac aaaaggtcca 33 60 
tgtatgatta gcattatttt cttttgcttt tgtttatgga caatgttcag ctgacataag 3420 
cagaagttgg ccaaaatact gcctgtactg ttaatttcct gtataattca cttaaataaa 3480 
agcaggttaa cctcaatgat agcagttaaa atgttctatc ttatgtattt cttttaagta 3540 
ttaccattat ggtgctactg agcgttttct tttggtaaaa agaaaaatgc catgggctgc 3600 
agtcttcttc catcactttt ccctaccagg tccattaata tgcttataac actagtgcca 3660 
gttattttat ttgataatgc ttatggtatt tgtatatttg tttaagtgaa ttatacagga 3720 
aattaacagt acaggcagta ttttggccaa cttctgctta tgtcagctga acattgtcca 3780 
taaacaaaag caaaagaaaa taatgcgtcg acg 3813 

<210> 30 

<211> 882 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 243794. l.j 

<400> 30 

ggcccaaact gagttgcgtt gcctcttctt gcaaatgaac ttgctccgta aaattgagaa 60 
cctggaacct ctgcagaaac tggatgctct taacctcagc aacaattaca tcaagaccat 120 
tgaaaacctc tcctgcctcc cagtcctgaa cacattgcag atggcccaca atcacctgga 180 
gaccgtggag gacattcagc atctacaaga gtgtttgagg ctttgtgtcc ttgacctttc 240 
gcacaacaag ctgagtgatc cggaaatatg gcctcaatat gtgccgccag tgtttccgtc 300 
agtacgcgaa ggatatcggt ttcattaaga aagacctgag ctgtcttcct tggcactgcc 360 
tatggaggtg acacccatct cctccatcat ggccatcctg agaccgctcg cgaagcccaa 420 
gatcatcaaa aagagcacca agttcactgg gaaccagtca gactgatatg tcaaaattaa 480 
gggtaactgg tggaaacaca gaggtattga caacagggtt catagaaggt ttgagggcca 540 
gatctatgcc caacattggt tatgggagaa acaaaaagac aaagcacata ctgcccagtg 600 
gcttctggaa gttcctggtc cacaacgtta aggagctgga agtactgctg gtgagcaaca 660 
aatcttactg tgttgagatc actcatgatg tttcttccaa gaactgcaaa gccatcttgg 720 
aaagagcagc ccaggtggtc atcagagtca ccaatgccaa tgccagcctg cacagtgcag 780 
aaagtgaata gacagtgaat gtgtttgttt tattggggtt taaataaaac caataaaact 840 
gtaaaagcag cggcaacaaa aaccagaaaa aaaaagtcga eg 882 

<210> 31 
<211> 514 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc_feature 

<223> Incyte ID No: 442085. l.j 

<400> 31 

ctcgttcctg tcgcgcagca cggacctcca cttccacatc tcccccggcg tcggcgcggt 60 
cagttgaacc atggcggact ccaaggccac ctcggcggtc accctccgca cccgcaagtt 120 
catgaccaac cgcctcctgg cccgcaagca attcgtgctt gaggtgatcc accccggccg 180 
cgccaacgtc tccaaggcgg agttgaagga gaggcttgcc aaggcgtacg aggtgaagga 240 
ccccaacacc atctttgtct tcaagttccg cacccacttc ggaggaggaa agtccactgg 300 
tttcggcctc atctacgaca acctcgaggc tgccaagaag ttcgagccga aataccgcct 360 
catcaggaat ggtcttgcta ctaaggttga gaagtcccgc aagcaaatga aggagcggaa 420 
gaacagggcc aagaagatcc gtggtgtcaa gaagaccaaa gctggtgacg ccaagaagaa 480 
gtaaacgttc gtttacattt gtattactgt tctg 514 
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<210> 32 
<211> 766 
<212> taSJA 
<213> Homo sapiens 

<220> 

<221> niisc_feature 
<223> Incyte ID No: 370661. 3. j 

<220> 

<221> unsure 
<222> 48, 354 

<223> a, t, c, g, or other 
<400> 32 

cgggaatccg ccgtttgcgc 
gatgagcggc ggcggcggcc 
aatggtgagg ccgagtcagg 
ggcagaaatg gaaaccatga 
tgtggctgcg gtgcacggca 
gactccatca gcagctatga 
agcaggactg tgaggagaag 
cct^accGGGu ggaC't^GCG'i^g 
tgctgccctg ggatttggaa 
ggtgcggggc acccagcagt 
gcttggctct gtgctccgta 
cttagctcca aaat.taactt 
aaaaaaaatt cttactgtta 

<210> 33 

<211> 1416 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 
<223> Incyte ID No: 427939. 17 o 

<220> 

<221> unsure 

<222> 1284, 1412 

<223> a, t, c, g, or other 

<400> 33 

attttttcta aatgtgagat caaggaatta ccaccaaaaa aggagagtaa tacaggagaa 60 
atattccaga cagtaatgtt ggaaagacat gaaagccacg acatacaaga tttttgcttc 120 
agagaaaccc agaaaaatgt acatgactct cagtgtctgt ggaaacatga ttgaagacat 180 
tataagcgag tgcgtgtgac ctataaggaa agtctcattg gtagaagaga catgcatggt 240 
agaaaggatg atgcacaaaa gcagcctgtt aaaaatcagc ttggattaaa cccgcagtca 300 
catctaccag aactgcagct atttcaagct gaagggaaaa tatataaata tgatcacatg 360 
gaaaaatctg tcaacagtag ttccttagtt tccccacccc aacgtatttc ttctactgtc 420 
aaaacccaca tttctcatac atatgaatgt aattttgtgg attcattatt cacacaaaaa 480 
gagaaagcaa atattgggac agaacactac aaatgtaatg agcgtggcaa ggcctttcat 540 
caaggcttac attttactat acatcaaata atccatacta aagagacgca atttaaatgt 600 
gatatatgtg gcaagatctt caataaaaaa tcaaaccttg caagtcatca aagaattcat 660 
actggagaga agccatataa atgtaatgaa tgtggcaagg tcttccataa tatgtcacac 720 
cttgcacagc atcgcaggat tcatactgga gagaaaccat ataaatgtaa tgaatgtggc 780 
aaggtcttta atcaaatttc acaccttgca caactatcaa aagaattcat accggagaga 840 
aaccttataa atgtaatgaa tgtggaaagg tcttccatca aatttcacac cttgcacaac 900 
atcggacaat tcatactgga gaaaaacctt acgaatgtaa caaatgtggc aaggtgttca 960 
gtcgcaattc ctaccttgta caacatctga tcattcatac tggagagaaa ccttacagat 1020 
gtaatgtatg tggaaaggtc ttccatcata tttcacacct tgcacaacat cagagaatcc 1080 
acactggaga gaaaccttac aaatgtaatg agtgtggcaa ggtcttcagt cacaagtcat 1140 
ccctagtaaa tcactggaga attcatactg gagagaaacc ttacaaatgt aatgagtgtg 1200 
gcaaggtctt cagtcacaag tcatccctag taaatcactg gagaatccac actggagaga 1260 
aaccttacaa atgtaatgaa tgtngcaagg tcttcagtcg caattcatac cttgcccaac 1320 
atctgataat tcatggccgg tgagaaacct tataagtgtg atgaatgtga caaagcattc 1380 
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tgaggcaatg gcggcagctg 
gggggctgca ctggaggact 
tgagctcagc cggcttcggg 
aggctgtggc agaggtgagc 
gtgccaagag gaggtggcct 
agcccagatc accgccctga 
gagcgggagc tgggccgcct 
gagaagcaga tggaaaaggt 
gaatgttgag gactgtggat 
gattctagga atagcacaga 
gggtcctggc ttgcccctta 
aagttatgta gagggaaggg 
ttaaatattg acttctggtg 



cgccggtngc cgcggacgac 60 
cccggtccca ggaaggggca 120 
ctgagctggg caggcgccct 180 
gagagcacga aggccgaggc 240 
cgctgcaggc catcctgaaa 300 
agcaggagcg acanagcagc 360 
gaagcagctg ctgtcccggg 420 

.♦•./?rrfrra.*r.tr<*#*fi»* a.*-r*«/?«t-,*-/^»^/^ ASir\ 

wyvj*- v., v.«-ww -» w w 

gtcccattcc ctggtcatgg 540 
gttaccaggt gttgttgaga 600 
ttagtgggtg accaagttcg 660 
aacttacact taagggtaac 720 
tacatt 766 
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agtcaaaatt cacatgcttg tacaacatca cngaat 1416 

<210> 34 

<211> 441 

<212> DMA 

<213> Homo sapiens 

<220> 

<221> misc.feature 

<223> Incyte ID No: 430569. 2. j 

<400> 34 

aacacttgcg ttgtaatcaa ctacgtgaat aagtgttgtt attcttacct acagaagcca 60 
ggttcctgaa gatttcttgc atcacatctc tgtagagatt cttttgagaa ggatccagca 120 
aagcccactc ctcctgagta aaggccacag ccacatcttc aaatgacact gaatcctaga 180 
atatcgcaca tatgtggaga ggaggatggc ttagacagac agtactaaga atctatactc 240 
atctcataaa atcgtaacat aattctgcag acttcaaaca tttcttccat gacctggtca 300 
tcagaacttt actctctgtc tgcacttact gctgcccact caacattctt catgctaaaa 360 
tgaaactatt cagaaagtca acagcagagt aggaacacct gtctcattgg caggtgcagg 420 
gaataaactg tgctgaaaga a 441 

<210> 35 

<211> 275 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 444689. l.j 

<220> 

<221> unsure 
<222> 44 

<223> a. t, c, g, or other 

<4ao> 35 

ctatgcaggt tgttgtcggg ataccagatt tcctactccg agangccccg ggtccctctg 60 
ccacaacttc tgtcgctctg ccgcctgcac cgtgacccgc actattcacg ggagccctag 120 
agaggacacc gggacaccca gaagccggga aatgatgttt caggattcag tggcctttga 180 
ggatgtggct gtcagcttca cccaggagga gtgggctttg ctggatcctt cccagaagaa 240 
tctctacagg gatgtgatgc aggaaacctt caaga 275 

<210> 36 
<211> 517 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc_£eature 
<223> Incyte ID No: 445198. l.j 

<220> 

<221> unsure 
<222> 48-49 

<223> a, t, c, g, or other 
<400> 36 

cttgcagaag gtggttatcc tttacggttc 
gaacatatct caggcatcag tgtcattcaa 
gtggcagcaa atggcccctg ttcagaagaa 
cagcaacctc gtctcagtgg ggtactgctg 
gcaaggagag gagccttggt tctcagagga 
ttacagaggt gatgacctga tcaagcagaa 
agcaatatgt atcaataata aaacattgac 
atttactctg catgtagctg ctgttgcttc 
ggaagtgaat ttgcaaagta tttctgaatt 



tccacactct ttcaggannc agcagaaaat 60 
ggacgtgact atagaattca cccaggagga 120 
tctgtacaga gatgtgatgc tggagaacta 180 
tttcaaacca gaggtgatct tcaagttgga 240 
ggaattctca aaccagagtc acccaaaaga 300 
caagaaaatc aaagacaaac acttggagca 360 
tacagaggaa gagaaagttt tggggaaacc 420 
aacaaaaatg tcctgcaaat gcaactcatg 480 
tatcatt 517 
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<210> 37 

<211> 499 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc„feature 

<223> Incyte ID No: 084399.1 

<220> 

<221> unsiire 

<222> 28, 136, 141, 157 

<223> a, t, c, g, or other 

<400> 37 

ggacatgtcc aggcggaagc aggcgaangc cgctcggtga aagttgaaga gggggaggcc 60 
tcagacttct cgctggcctg ggattcctcc gtgacagcag caggaggcct agaaggagag 120 
ccagagtgcg atcagnaaac nagccgtgcg ctggaanaca ggaacagcgt gacaagtcaa 180 
gaggagagaa atgagggatg atgaagacat ggaggatgaa tcaatttaca cctgcgatca 240 
ctgtcagcag gacttcgagt ctctggcaga cctgacggac caccgggccc accgctgtcc 300 
tggaggtaat gcaaaacagc cccaggactc tgacaggtgg ttacacgggt aattggacat 360 
agcaacagct tgaagcctca agtgagatta aagaattaat aatgtaattt tacagttaat 420 



<210> 38 

<211> 1017 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 350044.1 



<220> 

<221> unsure 
<222> 775 

<223> a, t, c, g, or other 
<400> 38 

ctggcggagg ccttgctgat gaacctgact gagggtcccc tggcgatggc agaaatggac 60 
cctacacagg gccgtgtggt ctttgaggac gtggccatat atttctccca ggaggagtgg 120 
gggcaccttg atgaggctca gagattgctg taccgtgatg tgatgctgga gaatttggcc 180 
cttttgtcct cactaggttc ttggcatgga gctgaggatg aggaggcacc ttcacagcaa 240 
ggtttttctg taggagtgtc agaggttaca acttcaaagc cctgtctgtc cagccagaag 300 
gtccacccta gtgagacatg tggcccaccc ttgaaagaca ttctgtgcct ggttgagcac 360 
aatggaattc atcctgagca acacatatat atttgtgagg cagagctttt tcagcaccca 420 
aagcagcaaa ttggagaaaa tctttccaga ggggatgatt ggataccttc atttgggaag 480 
aaccacagag ttcacatggc agaggagatc ttcacatgca tggagggctg gaaggactta 540 
ccagccacct catgccttct ccagcaccag ggccctcaaa gcgagtggaa gccatacagg 600 
gacacagagg acagagaagc ctttcagact ggacaaaatg attacaaatg tagtgaatgt 660 
gggaaaacct tcacctgcag ctattcattt gttgagcacc agaaaatcca cacaggagaa 720 
aggtcttatg aatgtaacaa atgtgggaaa ttctttaagt acagtgccaa tttcntgaaa 780 
catcagacag ttcacactag tgaaaggact tatgagtgca gagaatgtgg aaaatccttt 840 
atgtacaact accgactcat gagacataag cgagttcaca ctggagaaag accttatgag 900 
tgcagcgaat gccagaaggc ctttattaga aagtctcacc tggttcatca ccagaaaatc 960 
cacagtgaag agaggcttgt gtgctccatg aatgtgggaa ttctttagct aaaactc 1017 

<210> 39 

<211> 1231 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 441329.2 



aaagcaacag ctttccggc 
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<400> 39 

ttttgcttga taataactat aattactaca aggactgcct gagatgaaaa agacaggtga 60 
aatcgaactt tcacatggac tggtgtctga gctggagagt ctgtttgagt ttgagtaggg 120 
gaaagcctga aaccccccac cacctgctcc cgttagtccc tgcattaact aggggcggac 180 
tgggacctga tcctccaagc tggaggcttg aaaactcaag ggcaggcgcc tggtccagcg 240 
attcggaaac cgagatgcaa agtcctcccg cgctcctaga gaggacccgg aagcgcggcg 300 
cggtcccgga agacgaggtg gtgacacgct tccggccttt gtgactcatt gtgtctgtgt 360 
cgaggcgtcg ggagggccta agtccgtgtg cggtgccctt cggccggcct gagccccaga 420 
gtcagctccc ctttctcgcc cagcgccccc aggccgctcc cggggctcac ggaatagtaa 480 
agaaacacat cataaaacct cccaggacat aaaggtgagc acagaccctg tttggatcaa 540 
gtcagttcct ggagcctgaa tgatgactgc tgaatcacgg gaagccacgg gtctgtcccc 600 
acaggctgca caggagaagg atggtatcgt tatagtgaag gtggaagagg aagatgagga 660 
agaccacatg tgggggcagg attccaccct acaggacacg cctcctccag acccagagat 720 
attccgccaa cgcttcaggc gcttctgtta ccagsiacact tttggggccc cgagaggctc 780 
tcagtcggct gaaggaactt tgtcatcagt ggctgcggcc agaaataaac accaaggaac 840 
agatcctgga gcttctggtg ctagagcagt ttctttccat cctgcccaag gagctccagg 900 
tctggctgca ggaataccgc cccgatagtg gagaggaggc cgtgaccctt ctagaiagact 960 
tggagcttga tttatcagga caacaggtcc caggtcaagt tcatggacct gagatgctcg 1020 
caagggggat ggtgcctctg gatccagttc aggagtcctc gagctttgac cttcatcacg 1080 
aggccaccca gtcccacttc aaacattcgt ctcggaaacc ccgcctctta cagtcacgag 1140 
gtaagaagca aggtttcatt taggggaagg gaaatgattc aggacgagag tctttgtgct 1200 
gctgagtgcc tgtgatgaag aagcatgtta g 1231 

<210> 40 

<211> 730 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 442401.2 

<220> 

<221> unsure 
<222> 631 

<223> a, t, c, g, or other 
<400> 40 

gcggcttccg ggatttggcg 
caggggctcc cagtccttcc 
agaacctctg ttactctgtg 
catcccggaa gctgggaaat 
catagaattc tctcgggagg 
ggatgtgatg ttagagaact 
ggacctgatc acctttttgg 
agtagccatc cagccagata 
tatagaagca tcattccaaa 
tttaaactta aggaaagagt 
caacaagata atctctgcat 
tgaaagtgtt gcttcgaaaa 
tacagttaag 

<210> 41 
<211> 575 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> mis cofeature 
<223> Incyte ID No: 444933.2 

<400> 41 

gggtccggcg ctccagaaca gaacgatccc tgaggctccc ttgctcgaac tgtgggactt 60 
accctactat ggtccgagcc taccctattt cattatactc aagtaacgcc ccagaaattc 120 
cagagaatct cacacaaaga ggttgagtct tgccgtggtg ccttcagggg aatgtcatcc 180 
cgggctagaa gagctgcaaa aggctgtcag gcttctcaga actttgcttc tccagcagaa 240 
taatcctgcg gaagactgag cagttcttgt gagtgtaaaa ccatggccca tgcattggtg 300 



gtggcctttg ttggctgcag 
atctgggagg ccaaggcggc 
accggcaggc accgggagat 
ggtgaatgtg ccagggactg 
agtgggaaca cctggactca 
acggaaacct ggtctctctg 
agcaaaggaa agagccctgg 
tcttttctca tgatactcaa 
aagtgatatt ggatggatat 
gggaaagtga gggcaaaata 
gagaaaaagg nccaaagaca 
ggggtgaagt ttatatgggt 



taagagctca gtctcttcac 60 
ttcgcgttct gagaatagac 120 
ccgtagctca gacgccagga 180 
ttgacattca gggatgtggc 240 
gatcagaagc ttttatatgg 300 
ggtctcgctg tctctaagcc 360 
aatgtgaaga gtgcagagac 420 
ggcctcttaa gaaagaagct 480 
gggagctgtg gccctcagaa 540 
atcctatggt gaaaaaaaat 600 
agtgaatttt ctgagggtga 660 
ctatttattt gtctaacatg 720 

730 
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acgttcaggg atgtggctat agacttctct cagaaggaat gggagtgcct ggacactacc 360 
cagaggaaat tgtacagaga tgtgatgttg gagaattata ataacttggt ctcactggga 420 
tattctggct caaagccaga tgtgattacc ttactggagc aagggaaaga gccctgcgtg 480 
gtggcgaggg atgtgacagg aagacagtgc cccggtttgt tatccaggca taagaccaag 540 
aaattatctt cagaaaagga cattcatgaa atcag 575 

<210> 42 

<211> 734 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 481129.4 

<400> 42 

ggcgtgcgag actcggcggg cgctgttgag ggagtcgggc cgcgactgtg gtcgttttta 60 

taccttcccg cgcggacgcc ggcgctgcca acggaagggc gggtagggcg gtgcgtgatt 120 

aggttggcga agagacggag tttcgtcatg ttggccaggc ccatttgaga tctttgaaga 180 

tatcctcaac gtgaggctct gctgccatga aggtgaagat taagtgctgg aacggcgtgg 240 

ccacttggct ctgggtggcc aacgatgaga actgtggcat ctgcaggatg gcatttaacg 300 

gatgctgccc tgactgcaag gtgcccggcg acgactgccc gctggtgtgg ggccagtgct 360 

cccactgctt ccacatgcat tgcatcctca agtggctgca cgcacagcag gtgcagcagc 420 

actgccccat gtgccgceag gaatygaayt tcaaggagtg aggcccgacc tggctctcgc 480 

tggaggggca tcctgagact cct tec teat gctggegccg atggctgctg gggacagegc 540 

ccctgagctg caacaaggtg gaaacaaggg ctggagctgc gtttgttttg ccatcactat 600 

gttgacactt ttatccaata agtgaaaact cattaaacta ctcaaatctt gctggaggcc 660 

tctgggtgcc tgtgttctcg gcatatagat gtggtctcgg tgtgttttga tatgaaaact 720 

ctaaatgaat aaac 734 

<210> 43 

<211> 1104 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 481999.1 

<220> 

<221> unsure 

<222> 50, 201, 298, 905-906, 955, 961, 966, 979, 1001, 1021, 1028, 1032, 
1074 

<223> a, t, c, g, or other 
<400> 43 

ctgtcatcca tgctgatatg ctttcatttt gctgtgatag ggagagaatn cacatgtgct 60 
tctcggaatt tgtgctgtca ctggagcatt tatttgagtg tctataatgt attcagggtt 120 
gtgctaaaac ctgtcagtag agaacaataa gaagaagtca caatctagtt gtgctgagat 180 
atattcagtt aggtgaaaga ngtgctcttc aagccatatg actagatgag atcaccggtg 240 
aatatgtaca gacaggggag aagaaagtga ggaccatcaa gtgataggag accaaagngt 3 00 
cttgtcaagg catgeeaaga caaggtccct cattctccag tcaacatagt ggagaacgta 3 60 
cttgcctgac gccagcatgt gtgtgatggt ttaggactca gtggctttag aggatgtggc 420 
tgtgaacttc acccgagaag agtgggcttt gctgggtcct tgtcagaaga atctctacaa 480 
agatgtgatg ctggagaact acatgaacct ggcctctgtg gaatgggaaa taeaacctag 540 
aaccaaacgg tcatcacttc agcagggttt tttgaagaat caaatattca gtgggataca 600 
aatgacaaga ggctacagtg gatggaaact ctgtgactgt aagaattgtg gagaggtctt 660 
cagggaacag ttttgcctta agacacacat gagagttcag aatggaggga atacttctga 720 
gggtaattgt tatggaaaag acaccctcag tgtgcacaag gaagcetcta ctggacagga 780 
actttccaaa tttaatccat gtggaaaagt ctttactcta actccaggtc ttgctgtaca 840 
tcttgaagtt ctcaatgcaa gacaacccta caaatgtaag gaatgtggaa aaggctttaa 900 
gtatnntgca agccttgata atcatatggg aatccacaet gatgagaaac tctgngaatt 960 
ncaggnatat gggagagcng tcacagcttc ttcacaccta nagcagtgtg tagcagttca 1020 
nacagganag anatccaaaa agactaagaa atgtgggaaa tccttcacta attnttctca 1080 
actttatgca cctgtgaaaa ctca 1104 

<210> 44 
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<211> 665 
<212> DN?^ 
<213> Homo sapiens 

<220> 

<221> misc_feature 
<223> Incyte ID No: 233814. l.j 

<400> 44 

ggagtgtgga agagccatcc 
gcaggtggcg gacgggggtc 
tacgagtccc tgaggatctg 
ccggacagaa gtggcggttg 
acccccctgg gtcccaccct 
cctggcgtcc acctctgcgg 
gatcagcctg ccagccaagc 
cgtgtttacc atcgacctgg 
gtacacgagc atgtccgact 
gtaccgggga gctgctgtga 
agccaacgac ttcttccgac 
aaaga 

<210> 45 
<211> 580 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> misc_featiire 
<223> Incyte ID No: 351376. 4. j 

<400> 45 

agacactgct tgctgcggca gagacgccag aggtgcagct ccagcagcaa tggcagtgac 60 
ggcgttggcg gcgcggacgt ggcttggcgt gtggggcgtg aggaccatgc aagcccgagg 120 
cttcggctcg gatcagtccg agaatgtcga ccggggcgcg ggctccatcc gggaagccgg 180 
tggggccttc ggaaagagag agcaggctga agaggaacga tatttccgac attacaggtt 240 
atgctttgag atctctttgg ggtgaaggat tgaaattaaa ccctgagcca ccgtgtcctt 300 
gtagagcaca gagtagagaa caactggcag ctttgaaaaa acaccatgaa gaagaaatcg 360 
ttcatcataa gaaggagatt gagcgtctgc agaaagaaat tgagcgccat aagcagaaga 420 
tcaaaatgct aaaacatgat gattaagtgc acaccgtgtg ccatagaatg gcacatgtca 480 
ttgcccactt ctgtgtagac atggttctgg tttaactaat atttgtctgt gtgctactaa 540 
cagattataa taaattgtca tcagtgaact gtgaaaaaaa 580 

<210> 46 

<211> 1935 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No; 338992.1 

<220> 

<221> unsure 

<222> 52, 55, 83, 88, 96, 208, 511 
<223> a, t, c, g, or other 

<400> 46 

tgaaagtact tttactccca agggtatctg atctgatgga gagaatgtat cnctngtacc 60 

gatgcggata gaacgctcat tanttacngc acagtnttca taaattccaa cttccctagg 120 

aattttgtct tttgaaggca tgttggaagt tggggcatta ggatataatc tcagtccatg 180 

tactaagcct ctgcataact gaaagatnga gtcaataaaa ttcactctct cattttttgt 240 

ttttaatctc aagaattttt gccttgaagg aaaaatgtag tggagaattt gatgtttgaa 300 

cgaatcccag tcagaagttc cagcctgcca ctgttctctg atgccatgcc agcaccaact 360 

caactgtttt ttcctctcat ccgtaactgt gaactgagca ggatctatgg cactgcatgt 420 

tactgccacc acaaacatct ctgttgttcc tcatcgtaca ttcctcagag tcgactgaga 480 

tacacacctc atccagcata tgctaccttt ngtcaggcca aaggagaact ggtggcagta 540 



agagtgagtg cctggggcag 
ggtggtggat ggcccagccc 
tgggtgtgag ccagccgcgt 
ctgacgcctg gaaattcccc 
ccctcaaggc ctcctccacc 
ctcctacctg ggtgcaatcg 
tcatcaatgg cggcatcgcg 
ccaagacaag gctgcagaac 
gcctcatcaa gaccgtccgc 
acttgaccct cgtcaccccc 
atcagctctc taaggacggg 



agtgagagca ggaccagcgg 60 
tggaaagccg aggtgaaggt 120 
ggccctgggc aacggcgctg 180 
tgaaggtgga gcaccaccca 240 
tccacctcca ccccgcctgg 300 
agttaaatgg ctgataagca 360 
gggctgatcg gtgtcacctg 420 
cagcagaacg gccagcgcgt 480 
tccgagggct acttcggcat 540 
gagaaggcca tcaagctggc 600 
cagaagcttg accctgcttt 660 

665 
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cacccaagga 
caatagcatc 
tgtcagttct 
accggagaag 
gccatgcagg 
tctctttgtt 
cactgctacc 
ccaaattgta 
gtgagtcgac 
atgacatctc 
ttcgagtggc 
atgtgggcca 
tggtcagcag 
ctgaaattgg 
ggcttgctga 
caaaagagag 
attcctccta 
cgattaaggc 
cataggcagg 
ccaatagctg 
aggagaacca 
cacgctgtgg 
agtcttcctg 
ttcaattctc 



aggagatatg 
cttaaagcta 
atcctgtcga 
tgcagcaacc 
ttgtgcttgt 
accccatgag 
cattctccag 
ctttaacagc 
tgatattgat 
cttggaggcg 
attttctgga 
atactggcga 
tcacgctgtc 
aacatccaaa 
tgccatttag 
tgatagaatc 
attatcacac 
cacaggataa 
atgtggttag 
ttggtggcta 
aaatgtcctc 
gcaacaacga 
aagagcttgc 
atgtt 



cttccacacc 
atgaatacag 
tttgacagca 
tgcttgcaga 
tcccaggcag 
actttgctag 
tggcacaagc 
ttgaggactt 
gttaaggagg 
caagttggtg 
gccactgctt 
tagcagagcc 
taatgaccac 
gagtgaggcc 
ggcatttgga 
tggcccagac 
acctccttat 
gtttctggtg 
gattgtgggt 
caaggtgact 
ggtatttgag 
gtttgggact 
tcgaatgtac 



acatacagtc 
tttcaaagtg 
atcagctgcc 
ccagagggct 
tcagtgaaag 
agattgaaaa 
accccaatga 
actggcaaga 
ctctaattaa 
atcctaattc 
gtgtggccca 
atgctgggtg 
aatgctcaaa 
aagagtgtcg 
gatgtaaagt 
cagttgaatg 
ctcactgctg 
ttggctactg 
gagtacctaa 
ctgggacaga 
gatcagaacg 
gttgatcatg 
agagatgaca 



cccctcacac 
ccagactttg 
tgcaaatgca 
ccttttgggg 
actcttttat 
tgcagtggag 
ttactttagt 
gcttatagac 
tgccttcaag 
ttttctcaac 
tgtggatggt 
tgcaggaaga 
atgaaagaga 
tgaaacagga 
tcaaatggag 
acaatgaata 
agccagaggt 
atgggttgtg 
ctggcatgca 
tgcatggcct 
cagcaaccca 
agcgcctctc 
ttacaatcat 



ctccgcaagt 
acggcaaaaa 
cccattgagg 
gtttttgatg 
tatattgctg 
agcggccggg 
aaggaggcat 
ctcaacactg 
aggcttgata 
tacctggtgc 
gttgaccttc 
ggacggctca 
actagaacgg 
tcggctgctt 
cattgacctt 
taccaagttt 
aacttaccac 
ggagactatg 
tcaccaacag 
tttaacagaa 
tctcattcgc 
taaaatgctt 
tgtcgfctcag 



600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1935 



<210> 47 

<211> 1709 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 200587. 3, j 



<400> 47 

gcccggcagc 

acccgcggcg 

tatggactat 

catgtgggaa 

tattataaat 

aatcttcact 

tttccacgca 

ggccaagccc 

gcacctggtg 

gcaggcagag 

catgctggcc 

taaagccctg 

tgctgcacag 

tacaatcgtg 

tggtgaattt 

gccagggaca 

tgaggagatg 

ggctttaggg 

ctgatttatc 

agcatttaaa 

tgttacagca 

ccccgcacca 

caccagccct 

cgtgcgaagg 

ggcctctggt 

tcccccatgt 

tgtttctctg 

ttggccatgt 

ggccttggga 

<210> 48 
<211> 766 



agtggggcgg 
agtcagccca 
aatgcaacga 
gcctggggaa 
gcagctcggg 
tccgggggca 
aaccagacct 
catttcatta 
gaagaacaag 
gtggacgaca 
aacaatgaga 
aaccaggaac 
gccttgggga 
gggcacaagt 
acccctctct 
gagaacaccc 
agttgattgc 
agatcatatt 
aacctggaga 
agggaaaaac 
gaggtgaagt 
tcgacgttct 
ccactggggc 
cagaggtact 
caaaggccac 
cccctgctgt 
tagtgtgttc 
cggcctcagt 
agagtatgaa 



ggatggaggc 
gcggctgcgg 
ctcccctgga 
atcccagcag 
aaagcctcgc 
ctgagtcaaa 
caaagggaca 
cttcctcggt 
tggcagcggt 
tcctcgcggc 
ctggcattgt 
gggtggcagc 
agcagcgcgt 
tttatggtcc 
accctatgct 
caatgattgc 
cttgtcatca 
cgtgatctca 
gctgtacttt 
tccttgtctg 
gtcaagctgt 
gtccccgatt 
ctgtcctgct 
ggttacacct 
caccttcaag 
caccctctat 
agatgtatgc 
atccccagta 
tggattccc 



ggccgtggcg 
gaaacacaac 
gccagaagtt 
cccgtattca 
gaagatgata 
taatttagta 
cacaggtggg 
ggaacacgac 
cacctttgtc 
agtccgcccg 
catgcctgtc 
tgggctacct 
ggatgtggag 
caggattggc 
atttggaggt 
tggccttggg 
gtgtccccag 
taagcggact 
ttcagatctg 
tgtctcttct 
aggttcacca 
ggtctgtggg 
tctgcgacag 
cggcgtctgc 
caaccctcca 
ccctgacact 
cccaccttct 
cagagaaagg 



ccggggaggg 
tcgccggaga 
atccaggcca 
gcaggaagaa 

ggggggaaac 

atccattctg 
caccacagcc 
tccatccggc 
ccggtgtcca 
accacacgcc 
cctgaaatca 
cccatcctcg 
gacctgggcg 
gcactttata 
ggacaagaac 
aaggtgagcc 
cggctctagt 
gacaagaaaa 
tggattagtg 
gacatgagaa 
ttcccaaagg 
gtggccaatc 
taaccaggct 
aggctgctcc 
tgaaccacag 
gctttatttc 
gcttccccga 
gcttagcgcc 



atgcgccggc 
ggaaagttta 
tgaccaaggc 
aggccaagga 
ctcaagatat 
tggtgaaaca 
cagtgaaggg 
tgcccctgga 
aggtgagcgg 
tcgtgaccat 
gtcagcgcat 
tgcacacgga 
tggacttcct 
tacgaggact 
ggaatttcag 
ctggtgagct 
aggaatctgg 
agcctggaat 
gcacttgttt 
acagtgatac 
cgtcggcagc 
cagcacctgc 
caggttttgc 
ctcactgtgc 
ctacattggt 
ccctatctga 
gagcaggggc 
caggaggccg 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1709 
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<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc_feature 
<223> Incyte ID No: 246727. 5. j 

<400> 48 

ggccgaggtg cgggtcgcct 
ctccagcagt gcgccgccgc 
gaggctcgcc ccgtgccctg 
cLgcggtgctg aaaaagggag 
gatttctccg cagatcctgt 
agaaagtaag gcttaaggaa 
ccaagctact tctggaacag 
caatccataa cacttatgat 
gtggagtact tagcatcgga 
aagactgctt tggaaatggc 
gaacatgttc aaaagaaagc 
cgatatgacc tgccagcatc 
gacctaattc ggttttcctt 

<210> 49 
<211> 2757 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> misc_feature 
<223> Incyte ID No: 407087. 3. j 

<220> 

<221> unsure 

<222> 889, 917, 1054, 1098, 1122 
<223> a, t, c, g, or other 

<400> 49 

gcgggagcaa gtgcggctgg agctgagctt cgtcaactca gacctgcaga tgctcaagga 60 
agagctggag gggctgaaca tctcggtggg cgtctatcag aacacagagg aggcatttac 120 
gattcccctg attcctcttg gcctgaagga aacgaaagac gtcgactttg cagtcgtcct 180 
caaggatttt atcctggaac attacagtga agatggctat ttatatgaag atgaaattgc 240 
agatcttatg gatctgagac aagcttgtcg gacgcctagc cgggatgagg ccggggtgga 300 
actgctgatg acatacttca tccagctggg ctttgtcgag agtcgattct tcccgcccac 360 
acggcagatg ggactcctgt tcacctggta tgactctctc accggggttc cggtcagcca 420 
gcagaacctg ctgctggaga aggccagtgt cctgttcaac actggggccc tctacaccca 480 
gattgggacc cggtgtgatc ggcagacgca ggctgggctg gagagtgcca tagatgcctt 540 
tcagagagcc gcaggggttt taaattacct gaaagacaca tttacccata ctccaagtta 600 
cgacatgagc cctgccatgc tcagcgtgct cgtcaaaatg atgcttgcac aagcccaaga 660 
aagcgtgttt gagaaaatca gccttcctgg gatccggaat gaattcttca tgctggtgaa 720 
ggtggctcag gaggctgcta aggtgggaga ggtctaccaa cagctacacg cagccatgag 780 
ccaggcgccg gtgaaagata acatccccta ctcctgggcc agcttagcct gcgtgaaggc 840 
ccaccactac gcggccctgg cccactactt cactgccatc ctcctcatng accaccaggt 900 
gaagccaggc acggatntgg accaccagga gaagtgcctg tcccagctct acgaccacat 960 
gccagagggg ctgacaccct tggccacact gaagaatgat cagcagcgcc gacagctggg 1020 
gaagtcccac ttgcgcagag ccatggctca tcangaggag tcggtgcggg aggcgagcct 1080 
ctgcaagaag ctccgganat tgaggtgcta cagaaggtgt gngtgccgca caggaacgct 1140 
cccggctcac gtacgcccag caccaggagg aggatgacct gctgaacctg atcgacgccc 1200 
ccagtgttgt tgctaaaact gagcaagagg ttgacattat attgccccat tctccaagct 1260 
gacagtcacg gacttcttcc agaagctggg ccccttatct gtgttttcgg ctaacaagcg 1320 
gtggacgcct cctcgaagca tccgcttcac tgcagaagaa ggggacttgg ggttcacctt 1380 
gagagggaac gcccccgttc aggttcactt cctggatcct tactgctctg cctcggtggc 1440 
aggagcccgg gaaggagatt atattgtctc cattcagctt gtggattgta agtggctgac 1500 
gctgagtgag gttatgaagc tgctgaagag ctttggcgag gacgagattc gagatgaaag 1560 
tcgtgagcct cctggactcc acatcatcca tgcataataa gagtgccaca tactccgtgg 1620 
gaatgcagaa aacgtactcc atgatctgct tagccattga tgatgacgac aaaactgata 1680 
aaaccaagaa aatctccaag aagctttcct tcctgagttg gggcaccaac aagaacagac 1740 
agaagtcagc ccagcacctt gtgcctccca tcggtcgggg ctgcacggcc tcaggtcaag 1800 
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ccagaggtgc gtggtcgtgg 
cgtctcctgg ggcggcttgg 
cgtccgggcg tctccttagg 
aaggttgggg gtagggagga 
gccttcaaac cctacgagtc 
ctagagagtc gcctgcaaca 
tatcctacca ggccgcacat 
gacattgaaa ataaagtcgt 
actgcaatgt taggagcagg 
aagaacagca gtatattcct 
tgcagaatgg aaaatcaaga 
atacaagttt cacaaaaaga 
ttaaaagccc cgcaaacaaa 



cgcgagggat cctgaggctg 60 
gttagccggg aggtgggtca 120 
gcgcgtcttc gggtccggcc 180 
aacaagatcc cagttcaata 240 
catactttaa aacaaaatga 300 
agtggatgga tttgaaaagc 360 
tgcagcatgt atgctctata 420 
tgcagatcta ggatgtggtt 480 
ggacagatat ggcttttcta 540 
tacacaaatc ctcaactaga 600 
tagatattat agcagaactt 660 
aatcagtgga cattgaagtg 720 
agtcgt 766 
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aagaagctgc cctccccttt cagccttctc aactcagaca gttcttggta ctaatgtgag 1860 

gaaacaaaca tgttcaggcc ctgaacattt ccggtgctga ctcggcctta aacgtttgtg 1920 

ccataatgga aaatatctat ctatctgttc tcaaatcctg tttttctcat agtgtaaact 1980 

cacatttgat gtgtttttat gaaggaaagt aaccaagaaa cctctaggaa ttagtgaaaa 2040 

aagaactttt ttgaggtgtg ttactatact gctgtaagtt atttattata taaagtattg 2100 

taaatagaat agtgttgaag atatgaaata tggctatttt taatggtgac aattatgact 2160 

tttagtcact attaaattgg ggttacctat atcagtacaa tttgtagttg tttccaggtt 2220 

tggctaataa tcattcctta acctagaatt cagatgatcc tggaattaag gcaggtcaga 2280 

ggactgtaat gatagaatta aattagtgtc actaaaaact gtcccaaagt gctgcttcct 2340 

aataggaatt cattaaccta aaacaagatg ttactattat atcgatagac tatgaatgct 2400 

atttctagaa aaagtctagt gccaaatttg tcttattaaa taaaaacaat gtaggagcag 2460 

cttttcttct agtttgatgt catttaagaa ttactaacac agtggcagtg ttagatgaag 2520 

atgctgtcta caaggtagat aatatactgt ttgatactca aaacattttt cattttgttt 2580 

aaagtagaag ttacataatt ctatatttta agtcttgggt aaaaaagtag ttttacattt 2640 

tataaagtaa agatgtaaat gattcaggtt taaagctcta tttgacttcc tttttttgtt 2700 

tgagatagcg tcttgctgtg ttgcccaggc tgggagtgca gtgggtgtga tctcgag 2757 

<210> 50 
<211> 558 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> luisc^f eatuie 

<223> Incyte ID No: 44177 9. l.j 

<400> 50 

atggaggagc tatggacgtc gcaatgcacg cgtacgtaaa gctcggaatt cggctcgagg 60 

actcaggaag caatcatggt gctctctgca gctgacaaaa ccaacatcaa gaactgctgg 120 

gggaagattg gtggccatgg tggtgaatat ggcgaggagg ccctacagag gatgttcgct 180 

gccttcccca ccaccaagac ctacttctct cacattgatg taagccccgg ctctgcccag 240 

gtcaaggctc acggcaagaa ggttgctgat gccctggcca aagctgcaga ccacgtcgaa 300 

gacctgcctg gtgccctgtc cactctgagc gacctgcatg cccacaaact gcgtgtggat 360 

cctgtcaact tcaagttcct gagccactgc ctgctggtga ccttggcttg ccaccaccct 420 

ggggatttca cacctgccat gcacgcctct ctggacaaat tccttgcctc tgtgagcacc 480 

gtgctgacct ccaagtaccg ttaagccacc tcctgtcggg cttgccttct gaccaggccc 540 

ttcttccgtc ccctgaac 558 

<210> 51 

<211> 905 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 206603.1 

<220> 

<221> unsure 
<222> 374 

<223> a, t, c, g, or other 
<400> 51 

gggccgcggc cggcagaagg gctgttagga gggaccacgc gccgggggcc gcgatctctg 60 
gcagggggcg gtgtgccagc ggagcaccat gcacataggc gcccagcgcc ccgactaccc 120 
ctcccgagga aaagaggccg gggccgcgct ggggcggcgg agagcatgag ggaggccggg 180 
gggcggctcg gcttggagcg ctgctaggga gcggtgcgcg ccgcacaccc gcctgggcgc 240 
ggcggagggc ggggagccgg gcaggtcgcg cctgcgggcg gcagccgacc gccgggagct 300 
gttctgattt ccgacgcgca cgctaggggc ccggagcagc ccccggcccc ggcgcgccgc 360 
cgacatgggc aacngcaggg agcattggat tcgcagcaga ccgatttcag ggcgcacaac 420 
cgtgcctttg aagctgccga tgccagagcc aggtgaactg gaggagcgat ttgccatcgt 480 
gctgaatgct atgaacctac ctcctgacaa agccaggtta ctgcggcagt atgataatga 540 
gaaaaaatgg gaactgattt gtgatcagga acgattccag gtgaagaatc ctccccatac 600 
atacattcaa aagctcaaag gctatctgga tccagctgta accaggaaga aattcagacg 660 
gcgtgttcaa gaatctacac aagtgctaag agaactggaa atttctttga gaactaacca 720 
cattggatgg gtcagagaat ttctgaatga agaaaacaaa ggtcttgatg ttctagtgga 780 
atatctctca tttgcacagt acgcgagaac ttttgacttt gaaagtgtgg agagtactgt 840 
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ggagagctcg gtggacaaat caaagccctg gagtaggtcc atcgaggacc tgcacagagg 900 
gagca 905 

<210> 52 

<211> 2160 

<212> DMA 

<213> Homo sapiens 

<220> 

<221> raisc_feature 

<223> Incyte ID No: 435694.2 

<220> 

<221> unsure *i 

<222> 484, 488, 491, 494 

<223> a, t, c, g, or other 

<400> 52 

cgttcttttc ctttctcctt gaaagagtga atgcatttgg tcgcagggct aaagaggagg 60 
atgctatact tttctaaatg gcaagagatg gggagagaag gggattaaga gttgacccgc 120 
aacctcccgg ggattctttg ttcttaccag atctcttggc cactccccta ttctgaagtc 180 
gtcttggctc tcttgactgc tcccctattc tgaagtcgtc ttggctctct tgactgctcc 240 
cctattctga agtcgtcttg gctctcctga ctacactatt tcaaggaatg atcaccaaga 300 
cacacaaagt agaccttggg ctcccagaga agaaaaagaa gaagaaagtg gtcaaagaac 3 60 
cagagactcg atactcagtt ttaaacaatg atgattactt tgctgatgtt tctcctttaa 420 
gagctacatc cccctctaag agtgtggccc atgggcaggc acctgagatg cctctagtga 480 
aganaaanaa naanaaaaag aagggtgtca gcaccctttg cgaggagcat gtagaacctg 540 
agaccacgct gcctgctaga cggacagaga agtcacccag cctcaggaag caggtgtttg 600 
gccacttgga gttcctcagt ggggaaaaga aaaataagaa gtcacctcta gccatgtccc 660 
atgcctctgg ggtgaaaacc tccccagacc ctagacaggg tgaggaggaa accagagttg 720 
gcaagaagct caaaaaacac aagaaggaaa aaaagggggc ccaggacccc acagccttct 780 
cggtccagga cccttggttc tgtgaggcca gggaggccag ggatgttggg gacacttgct 840 
cagtggggaa gaaggatgag gaacaggcag ccttggggca gaaacggaag cggaagagcc 900 
ccagagaaca caatgggaag gtgaagaaga aaaaaaaaat ccaccaggag ggagatgccc 960 
tcccaggcca ctccaagccc tccaggtcca tggagagcag ccctaggaaa ggaagtaaaa 1020 
agaagccagt caaagttgag gctccggaat acatccccat aagtgatgac cctaaggcct 1080 
ccgcaaagaa aaagatgaag tccaaaaaga aggtagagca gccagtcatc gaggagccag 1140 
ctctgaaaag gaagaaaaag aagaagagga aagagagtgg ggtagcagga gacccttgga 1200 
aggaggaaac agacacggac ttagaggtgg tgttggaaaa aaaaggcaac atggatgagg 1260 
cgcacataga ccaggtgagg cgaaaggcct tgcaagaaga gatcgatcgc gagtcaggca 1320 
aaacggaagc ttctgaaacc aggaagtgga cgggaaccca gtttggccag tgggatactg 1380 
ctggttttga gaacgaggac caaaaactga aatttctcag acttatgggt ggcttcaaaa 1440 
acctgtcccc ttcgttcagc cgccccgcca gcacgattgc aaggcccaac atggccctcg 1500 
gcaagaaggc ggctgacagc ctgcagcaga atctgcagcg ggactacgac cgggccatga 1560 
gctggaagta cagccgggga gccggcctcg gcttctccac cgcccccaac aagatctttt 1620 
acattgacag gaacgcttcc aagtcagtca agctggaaga ttaaactcta gagttttgtc 1680 
cccccaaaac tgccacaatt gctttgatta ttccatttat gctggagatt acaaattttt 1740 
tttgtgaaaa aatcagatct tggtgaggac ctcgagcagt aagatataaa taactcccat 1800 
aagcttagcg ttccagtaat ggaacactag gcataaatgg tttattcagt tgtgcaaatg 1860 
aaagccatct gacagttggc tcacattgaa cacctgtgga gattaaggac gaggacaact 1920 
atattgatgg gcttggatga actggggcag ggcagctcat atttcgggag ccaggagaac 1980 
gagtgagtgc taaaacctcc tgttttctgt gttaaacatt ccgtccctgt ttgagacatc 2040 
agtatgtaca gttaactttt gttgagtgtt tagcaggtac tagggacata ctagtgtttt 2100 
ccttaatgta tttaatcttc ataattatga aatgggtgct attattagcc ccatcttata 2160 
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